Extreme slow operations that utilize internal message bus in VMware Cloud Director 10.0, 10.1
search cancel

Extreme slow operations that utilize internal message bus in VMware Cloud Director 10.0, 10.1

book

Article ID: 325518

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
In a VMware Cloud Director multi-cell environment, you experience these symptoms:
  • If one cell that triggers the task to the vCenter Server and waiting for the task completion notification as part of VCD property collector updates, it ends up waiting for very long time and receive notification only after making a direct call to vCenter Server. 
  • This only happens when you are running with multi-cell. It works fine you are running with single cell or shutdown other cells in case of multi-cell setup.

    For example: In the below log snippet, cell02 has trigger the reconfigure task task-50428 and waiting for it completion. Cell01 where Property collector listener is running, has received the task update, just after 4 sec but cell02 never received until cell02 make direct call to VC and received an update.

    Task update on cell01
    /opt/vmware/vcloud-director/logs/cell-runtime.log.x
    2020-04-21 08:48:44,803 | DEBUG    | ActiveMQ Session Task-20  | TaskManager                    | Handling completion update from MessageBusAdapter for task [vcId=<VC_UUID>, moref=task-50428] with state SUCCESS |

    2020-04-21 08:48:44,803 | DEBUG    | ActiveMQ Session Task-20  | WaitHandle                     | Task updated. task = [valref = [vcId=<VC_UUID>, moref=task-50428], state = SUCCESS, taskName = ReconfigVM_Task, progress = null, entityName = <VM_NAME>, errorMessage = null], polled from vc = false |


    Task status on cell02 by directly polling the VC at around 08:52

    2020-04-21 08:48:40,857 | INFO     | net-fabric-activity-pool-198 | VC20VirtualServer              | Invoking reconfigure vm [name = <VM_NAME>, valref = [vcId=<VC_UUID>, moref=vm-8018], changeVersion 2020-04-21T08:43:40.665367Z] for synchronizing nics. | requestId=<REQUEST_UUID>,request=POST https://example.com/api/vdc/<VDC_UUID>/action/composeVApp,requestTime=1587457340266,remoteAddress=<IP>:24200,userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ...,accept=application/*+xml;version 32.0 vcd=<VCD_UUID>,task=<TASK_UUID> activity=(com.vmware.vcloud.backendbase.management.system.TaskActivity,urn:uuid:<TASK_UUID>) activity=(com.vmware.vcloud.fabric.net.activities.ConstituteNetworkedVmActivity,urn:uuid:<ACTIVITY_UUID>)
    source appname bundle event_type filepath hostname pushed_dt tenant vcd_pr log_message vmw_vcd_severity vmw_vcd_vc_uuid vmw_vcd_uuid 

    2020-04-21 08:52:01,760 | DEBUG    | VC.TaskManager.TaskCompletionsRetriever | WaitHandle                     | Releasing waiter for task [vcId=<VC_UUID>, moref=task-50428] |

    2020-04-21 08:52:01,758 | DEBUG    | VC.TaskManager.TaskCompletionsRetriever | WaitHandle                     | Task updated. task = [valref = [vcId=<VC_UUID>, moref=task-50428], state = SUCCESS, taskName = ReconfigVM_Task, progress = null, entityName = <VM_NAME>, errorMessage = null], polled from vc = true |
  • You observed ActiveMQ certificate related issue.
  • You see entries similar to:
    /opt/vmware/vcloud-director/logs/cell-runtime.log.x
    2020-04-20 14:34:00,520 | ERROR    | ActiveMQ BrokerService[<SERVICE_UUID>] Task-15 | TransportConnector             | Could not accept connection from tcp://<IP>:47990 : javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown |

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware Cloud Director 10.x

Cause

This issue occurs in operation where inter-cell communication is required.
For example: Power on, Power off, compose, instantiate, DFW update, etc and running with multi-cell setup.

The same operation works fine if running with a single cell.

Resolution

To resolve this issue, check if the certificate is expired.
  1. Run this command:

    /opt/vmware/vcloud-director/bin/cell-management-tool jms-certificates --status
  2. If yes, regenerate it using this command:

    /opt/vmware/vcloud-director/bin/cell-management-tool jms-certificates --certgen --force
  3. Restart all the vCD cells by running this command:

    service vmware-vcd restart

    Note: If slowness persist even after this, and this exception "SSLHandshakeException: Received fatal alert: certificate_unknown" is no longer visible on the logs, add this parameter using this cell-management-tool and restart all the cells.

    vcloud.activities.activityRelayPollingIntervalMs = 5000


Additional Information

This Article is not applicable to versions of Cloud Director such as 10.2,  10.3 and 10.4 in which the messaging system has now changed.