HCX Site-Pair Status Stuck in Pending Due to Connector Resource Exhaustion
search cancel

HCX Site-Pair Status Stuck in Pending Due to Connector Resource Exhaustion

book

Article ID: 433177

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • VMware HCX Site-Pairs exhibit a status desync where the source side remains in "Pending" and the cloud side shows "Disconnected".
  • This occurs when a specific site-pair (often decommissioned but not removed) generates a massive volume of jobs (3000+) that leads to high CPU and OutOfMemory (OOM) conditions in the app-engine.

Symptoms include:

  • Connector polling jobs failing to execute.

  • "Edit Site-Pair" operations failing to synchronize.

  • java.lang.OutOfMemoryError observed in app-engine logs (/common/logs/admin/app.log) during service restart or job parsing. 

  • Rapid CPU consumption by the Java process (java -cp /opt/vmware/resources:/opt/vmware/app/*:/opt/vmware/third-). This can be seen by viewing TOP output and pressing "shift-R" twice to filter tasks starting with Highest CPU intensive task first.

    Top output:
    
    top - 21:16:13 up  1:09,  3 users,  load average: 7.50, 7.70, 7.45
    Tasks: 238 total,   1 running, 237 sleeping,   0 stopped,   0 zombie
    %Cpu0  :  96.0/0.7    97[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||   ]   %Cpu1  :  95.4/0.0    95[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||    ]
    %Cpu2  :  96.7/0.0    97[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||   ]   %Cpu3  :  95.4/0.0    95[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||    ]
    %Cpu4  :  97.3/0.0    97[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  ]   %Cpu5  :  96.6/0.0    97[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||   ]
    %Cpu6  :  96.0/0.0    96[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||    ]   %Cpu7  :  95.4/0.7    96[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||   ]
    GiB Mem : 31.3/23.5     [                                                                                          ]   GiB Swap:  0.0/4.0      [                                                                                          ]
    
      PID USER      PR  NI    VIRT    RES    **%CPU**  %MEM     TIME+ S COMMAND
     5405 admin     20   0 8004.8m   2.6g **767.5**  11.2 343:41.98 S java -cp /opt/vmware/resources:/opt/vmware/app/*:/opt/vmware/third-
  • Java "OutOfMemoryError" is seen in "app.log":
    <timestamps> UTC [VsphereReplicationService_SvcThread-47, T:'ChangeManagerCacheCleaner', R:########-####-####-####-30b4c8c418f6, J:########-####-####-####-30b4c8c418f6, S:CLEAN_CACHE, , TxId: ########-####-####-####-04a9fd9cd457] ERROR c.v.v.h.messaging.LoggingJobWrapper- java.lang.OutOfMemoryError: Java heap space

Environment

VMware HCX 4.11.x

 

Cause

  • Resource starvation caused by a recursive job accumulation (3000+ sync/cancel events) from a decommissioned site-pair prevents the polling service from processing the Remoting-Outbox status updates.

Resolution

  • Review KB: HCX Site Pairing displays "Pending" status in Connector UI after upgrade to 4.11.3
  • If the CPU consumption issue continues after waiting for "Job Recovery" to complete based on the steps in above KB, please open an SR with Broadcom Support and provide the information below:
  • Support bundles of both Source/Target HCX Manager with DB Dumps selected.
    • Login to the HCX Manager hybridity page https://<HCX-Manager-FQDN>:443
    • Navigate to Administration > Troubleshooting > Ensure Collect Database Dump is enabled.

  • Provide the site-pair name that is in disconnected state.
  • Mention the details about any configurational changes or maintenance done recently related to HCX/Network.