This issue can be temporarily bypassed by performing a rolling reboot on all cells, or alternatively, restarting the vmware-vcd service on all the cells using the command: service vmware-vcd restart
The following configuration changes can be implemented to reduce the impact of this issue. Please note: All cell-management-tool commands listed below only need to be executed on the primary cell
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n "jms.cluster.connectionTTL" -v "90000"
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n "jms.cluster.clientFailureCheckPeriod" -v "45000"
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vc-task-completions-retrieval-timer-interval-sec -v 60
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vcloud.activities.activityRelayPollingIntervalMs -v 60000
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n InventoryWait -v 600000
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n event.processor.running.duration.millisec -v 120000
service vmware-vcd restart
To verify if the values are already in place, use the "-l" option in the commands.
Example:
# /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n "jms.cluster.connectionTTL" -l
Property "jms.cluster.connectionTTL" has value "90000
Run this command on all the cells and you should see the same number of nodes and members for both defaultTopology=topology and vcd-cluster=topology:
tac /opt/vmware/vcloud-director/logs/cell-runtime.log | grep -m 1 "members="