HCX Migrations Stuck Due to Cloud Manager Out of Memory and Job Table Bloat
search cancel

HCX Migrations Stuck Due to Cloud Manager Out of Memory and Job Table Bloat

book

Article ID: 439304

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

Site-pairing flaps between Pending/connected.
Migration jobs remain stuck in the queue and fail to execute. Cloud HCX Manager logs display frequent Out of Memory (OOM) exceptions. Massive .hprof (Java heap dump) files are continuously created on the Cloud HCX Manager filesystem, correlating with JVM crashes.

/common/logs/admin/app.log

outofmemory events:
---------------[ Fatal Error running the job ]----------------
2026-04-15 04:44:50.541 UTC [TopologyService_SvcThread-1578, Ent: HybridityAdmin, , TxId: 7a07c115-0bf6-4e33-9caa-e7c59d5e0b5b] ERROR c.v.v.h.messaging.LoggingJobWrapper- java.lang.OutOfMemoryError: Java heap space
        at org.apache.http.util.CharArrayBuffer.expand(CharArrayBuffer.java:60)
        at org.apache.http.util.CharArrayBuffer.append(CharArrayBuffer.java:90)
        at org.apache.http.util.EntityUtils.toString(EntityUtils.java:228)
        at org.apache.http.util.EntityUtils.toString(EntityUtils.java:308)
        at com.vmware.vchs.hybridity.adapters.https.HttpsAdapter.internalExecute(HttpsAdapter.java:278)
        at com.vmware.vchs.hybridity.adapters.https.HttpsAdapter.executePost(HttpsAdapter.java:485)
        at com.vmware.vchs.hybridity.adapters.https.HttpsAdapter.executePost(HttpsAdapter.java:467)


}---------------[ Fatal Error running the job ]----------------
2026-04-15 06:27:37.094 UTC [InterconnectService_SvcThread-1567, J:c09d945f, , TxId: 042e4529-d32a-4a10-8f07-e779133beb6e] ERROR c.v.v.h.messaging.LoggingJobWrapper- java.lang.OutOfMemoryError: Java heap space

2026-04-15 06:27:38.124 UTC [MobilityTransferService_SvcThread-5674, Ent: HybridityAdmin, , TxId: 4fbcfb48-06c6-48f8-b588-3d00ce529cfb] WARN c.v.v.h.m.k.KafkaProducerDelegate- Publish failed and will retry
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for MobilityTransferJob-0:187697 ms has passed since batch creation
        at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98)

Job Table size validation:
select count(*) from job;

 

Environment

VMware HCX 9.0.1

Cause

An Out of Memory (OOM) condition on the Cloud HCX Manager caused by Job Table Bloat. High vMotion event churn on the cloud side generates an excessive volume of VM events. As the manager attempts to process these events, the internal Job table bloats beyond capacity, consuming all available heap memory and crashing the JVM. This causes the Site Pairing to flap and prevents the migration orchestrator from picking up queued jobs.

Resolution

  • Open a Support Request with Broadcom Support and reference this issue to obtain the environment-specific remediation plan.
  • Support bundles to be shared: HCX Connector/Cloud including DB/core dumps.
  • If incase support bundle collection fails, below steps can be performed to collect it through CLI.


# cd /opt/vmware/tools/
# ./export_tech_support_bundle.sh -l -d
options:
 l Export Core HCX Manager logs
 d Export Database Dump

  • After collecting the files for Broadcom review, manually delete the large .hprof files from the HCX Manager filesystem to reclaim partition space and prevent potential disk-full outages.

Additional Information

Engineering team is in progress to develop a permanent fix for this behavior, which is expected to be released in version 9.1.1. This update will ensure the DVPG is properly unregistered during the unextend process, fully resolving the issue moving forward.