Extreme slow operations that utilize internal message bus in VMware Cloud Director 10.0, 10.1
book
Article ID: 325518
calendar_today
Updated On:
Products
VMware Cloud Director
Issue/Introduction
Symptoms: In a VMware Cloud Director multi-cell environment, you experience these symptoms:
If one cell that triggers the task to the vCenter Server and waiting for the task completion notification as part of VCD property collector updates, it ends up waiting for very long time and receive notification only after making a direct call to vCenter Server.
This only happens when you are running with multi-cell. It works fine you are running with single cell or shutdown other cells in case of multi-cell setup.
For example: In the below log snippet, cell02 has trigger the reconfigure task task-50428 and waiting for it completion. Cell01 where Property collector listener is running, has received the task update, just after 4 sec but cell02 never received until cell02 make direct call to VC and received an update.
Task update on cell01 /opt/vmware/vcloud-director/logs/cell-runtime.log.x 2020-04-21 08:48:44,803 | DEBUG | ActiveMQ Session Task-20 | TaskManager | Handling completion update from MessageBusAdapter for task [vcId=<VC_UUID>, moref=task-50428] with state SUCCESS |
You see entries similar to: /opt/vmware/vcloud-director/logs/cell-runtime.log.x 2020-04-20 14:34:00,520 | ERROR | ActiveMQ BrokerService[<SERVICE_UUID>] Task-15 | TransportConnector | Could not accept connection from tcp://<IP>:47990 : javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown |
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware Cloud Director 10.x
Cause
This issue occurs in operation where inter-cell communication is required. For example: Power on, Power off, compose, instantiate, DFW update, etc and running with multi-cell setup.
The same operation works fine if running with a single cell.
Resolution
To resolve this issue, check if the certificate is expired.
Restart all the vCD cells by running this command:
service vmware-vcd restart
Note: If slowness persist even after this, and this exception "SSLHandshakeException: Received fatal alert: certificate_unknown" is no longer visible on the logs, add this parameter using this cell-management-tool and restart all the cells.