VMs with stuck HCX MON tasks for gateway relocation are periodically dropping off network and become unreachable.
search cancel

VMs with stuck HCX MON tasks for gateway relocation are periodically dropping off network and become unreachable.

book

Article ID: 393644

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • Multiple VMs with stuck MON tasks.
    • Configuring VM to use Remote Router as relevant IP address is not present on VM or not detected by VMtools.
    • Configuring VM to use Remote Router.
  • The VMs keep periodically dropping off the network and are unable to ping.
  • The VMs appear to function fine but after some time they stop responding to pings from external subnets.
  • Reviewing NSX T1 Static route entries show that the /32 HCX MON route is missing for affected VMs.

Environment

HCX Network Extension with MON feature enabled

 

Cause

Unstable communication between HCX Manager and vCenter can result in VMUpdateJobs being queued on the vCenter side. When the connection is restored, the HCX-MGR receives all pending jobs at once. This issue can be exacerbated if there are more than 150 VM's using Mobility Optimized Networking (MON) and HCX-MGR is running with the default CPU/MEM settings. 

Some examples of actions that will create VMUpdateJobs to be sent to HCX are:

  • vMotions (DRS or manual).
  • VM power on/off.
  • VM re-configuration (edit settings). 
  • VM NIC interface enable/disable.
  • HCX-MGR CPU utilization is at 100%

 

Resolution

  • Please follow the steps outlined in KB:321640 to increase HCX-MGR resources to 8 vCPU and 24GB. This adjustment will help HCX-MGR process any outstanding VMUpdateJobs more efficiently. 
  • Additionally, investigate and resolve any connectivity issues between vCenter and HCX. Ensure all connections appear healthy by checking the HCX Manager interface at port 9443. 

If you believe you have encountered this issue and if MON jobs do not progress after increasing HCX Manager resources, please gather the below information and open a support request with Broadcom. For more information, see Creating and managing Broadcom support cases

  • HCX Version in use on both sites.
  • Name of VM with stuck MON job.
  • Service-mesh servicing the Network Extension.
  • Details for network being extended (ip/subnet/vlan/network_name).
  • Source & Destination HCX log bundles (include DB), vCenter and NSX (if applicable at source).