HCX MON Enablement Failure and L2 Extension Removal Issues Due to HA Group "SUSPENDED" State
search cancel

HCX MON Enablement Failure and L2 Extension Removal Issues Due to HA Group "SUSPENDED" State

book

Article ID: 423184

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

During network migration activities using VMware HCX, users may encounter situations where MON cannot be enabled or a Network Extension cannot be removed.
Even if the network appears stable, the HCX Manager GUI or logs indicates that the operation failed at the initial stage.

  • Failure to enable MON on specific segments.
  • Inability to unextend/remove an L2 extension, with the task failing immediately.
  • Error messages in HCX Manager log /common/admin/logs are found as below:-
    <timestamp> UTC [NetworkStretchService_SvcThread-11514, j: e3b6b3d1, , TxId: <uuid>] WARN  c.v.v.h.n.i.AbstractJobInt- Exception in NetworkStretchJobs:EnablePRForExtendedNetworkHAWorkflow. Reason : HA group hagroup-<group-uuid> is not in GROUPED state: groupState=SUSPENDED, statusMessage=

    <timestamp> UTC [NetworkStretchService_SvcThread-11518, j: 69b2a7e7, , TxId: <uuid>] WARN  c.v.v.h.n.i.AbstractJobInt- Exception in NetworkStretchJobs:DisablePRForExtendedNetworkHAWorkflow. Reason : HA group not in valid state
  • Discrepancy where the HCX Interconnect UI shows HA status as "HEALTHY", but internal DBs recognize it as "SUSPENDED".

Environment

VMware HCX

Cause

  • The primary cause is the HA Group of the NE appliances entering and remaining in a "SUSPENDED" state.
  • Initial Connectivity Failure: During a previous configuration change (like enabling/disabling MON), the HCX Manager lost connectivity or DNS to the vCenter Server.
  • Stale State: Because the connection was lost during a state transition, the HA Group "Maintenance Mode" (SUSPENDED) did not roll back or complete successfully.
  • Database Inconsistency: The HCX Manager database maintains the "SUSPENDED" state internally to prevent configuration corruption.
  • As long as this state persists, the HCX Manager blocks further modifications to that specific network extension to ensure stability.

Resolution

To resolve the state inconsistency and allow subsequent network operations, follow these steps:

  • Restore Underlying Connectivity

Ensure that the HCX Manager (both Connector and Cloud) can consistently resolve the FQDN and reach the management IP of their respective vCenter Servers.
If the DNS server itself is located on a segment being migrated or cut over, ensure temporary static host entries or alternative DNS paths are available.

  • Synchronize HA State

Attempt to force the HCX Manager to recognize the actual state of the appliances:

  • Navigate to Interconnect > Service Mesh > View Appliances.
  • Go to the HA Management tab.
  • If the RECOVER or FORCE SYNC button is available, click it to synchronize the state.

 

  • Restart HCX Manager Services

If the UI buttons are grayed out or the state remains SUSPENDED:

 

  • Restart the HCX Manager appliance or the app-engine service.

This triggers a re-validation of the managed entities and can often clear stale database flags.
For instructions on restarting the app-engine service, please refer to the following steps.

  • SSH to the HCX Manager/Connector
  • Use "admin" credentials to SSH into the HCX Connector or Cloud Manager and change the user to "root".
  • Restart the services as shown below:
            systemctl restart app-engine

  • Force Unextend (Final Option)
    If the L2 extension must be removed and the HA Group remains stuck, use the Force Unextend option
    In the HCX UI --> Network Extension --> Select the corresponding Network Extension and Select the Force Unextend option as shown below

 

If you still need help with the issue, then kindly open a support case with Broadcom. Refer to the KB#142884

 

Additional Information

Note: This will remove the configuration from the HCX database regardless of the appliance state.
Ensure that no active VM traffic is relying on the extension before performing this action, as it may not clean up all logical switches or router ports on the underlying NSX/vSphere platform.