Slow Task Processing in VMware Cloud Director

Products

VMware Cloud Director

Issue/Introduction

VCD environments gradually degrade and task processing becomes slower and slower
Commonplace operations (power on/off a VM, modify the configuration of a VM, instantiate a new VM/vApp, etc.) take exceptionally long to complete
Tasks that normally only take seconds or a few minutes are taking 10 minutes or more to complete
This issue can usually be identified within the cell-runtime.log file; specifically you want to validate the expected Artemis cluster topology against the real Artemis cluster topology

Environment

VMware Cloud Director 10.x

Cause

The Artemis cluster is the internal mechanism used to facilitate cell-to-cell communication. On occasion, this mechanism degrades and cell-to-cell communication falters along with it. When cell-to-cell communication degrades, task processing can take considerably longer. This occurs because the cell that handles a particular task is unable to relay an update of the tasks completion as the internal communication mechanism is non-functional

Resolution

This issue is resolved in VMware Cloud Director 10.4.2 available at Broadcom Downloads.

Workaround:

This issue can be temporarily bypassed by performing a rolling reboot on all cells, or alternatively, by running vmware-vcd services on exclusively the primary cell. For versions 10.4.X, the following configuration changes can be implemented to reduce the impact of this issue. Please note: All cell-management-tool commands listed below only need to be executed on the primary cell

Set the connectionTTL to 90s. The default is 60s:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n "jms.cluster.connectionTTL" -v "90000"
Set the clientFailureCheckPeriod to 45s:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n "jms.cluster.clientFailureCheckPeriod" -v "45000"
Set the Task Poller retrieval interval to 60s - this polls vCenter for task updates:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vc-task-completions-retrieval-timer-interval-sec -v 60
Set the Activity Poller retrieval interval to 60s - this polls data from the activity table for completion of activities:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vcloud.activities.activityRelayPollingIntervalMs -v 60000
Set the VCD inventory timeout to 600s:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n InventoryWait -v 600000
Set the Event Processor duration to 120s:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n event.processor.running.duration.millisec -v 120000
Perform a shutdown and restart of the vmware-vcd services on ALL cells in the environment:
service vmware-vcd restart

Additional Information

To verify if the values are already in place, use the "-l" option in the commands.

Example:

# /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n "jms.cluster.connectionTTL" -l
Property "jms.cluster.connectionTTL" has value "90000

For more information, see the VMware Cloud Director 10.4.2 Release Notes.