NSX Edge Disconnected After Host Failure and Manager High CPU
search cancel

NSX Edge Disconnected After Host Failure and Manager High CPU

book

Article ID: 434463

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

In an NSX-T or NSX 4.x environment (such as VMware Tanzu/PKS), you may observe the following during an ESXi host crash or vSphere HA event:

  • An NSX Edge VM migrates to a surviving host but its data plane interfaces (e.g., fp-eth1) remain disconnected.
  • Attempts to manually reconnect the Edge VM adapter via the vCenter UI fail.
  • Traffic to the environment is halted, and other Edge nodes may report a "Degraded" status.
  • The primary NSX Manager node experiences high CPU utilization or is unresponsive.

Logs on the destination ESXi host:

  • hostd.log: NIC: connection control message: Failed to connect virtual device 'ethernet2'.
  • vmkernel.log: Net: 3325: Failed to associate port for OpaqueNetwork nsx.LogicalSwitch externalID Net: 3410: Failed to connect port to opaque network: Not found

Environment

VMware NSX

VMware Tanzu/PKS

Cause

This issue occurs when an NSX Manager node is unresponsive (e.g., due to high CPU) during an Edge VM migration. Opaque network port configurations are dynamically managed by the NSX Manager. During a vMotion or HA restart, vCenter must query the NSX Manager for port bindings on the destination host. If the Manager is unresponsive, the query fails, and the Edge VM attempts to use stale or non-existent port configurations on the new host, leading to a port association failure. NSX nodes crashed

Resolution

Resolution

Follow these steps to restore management connectivity and re-establish the Edge port bindings:

  1. Recover the NSX Manager:

    • Reboot the unresponsive NSX Manager node to clear the high CPU condition.
    • Verify the NSX Manager cluster status is Healthy and controller connectivity from the Edge VMs reports UP.
  2. Force Fresh Port Bindings:

    • Perform a vSphere vMotion of the impacted Edge VM to another ESXi host in the cluster.
    • This migration forces vCenter to initiate a new port allocation workflow, requesting fresh opaque port bindings from the now-healthy NSX Manager.
  3. Verify Connectivity:

    • Confirm the Edge VM interfaces (e.g., fp-eth1) connect successfully.
    • Verify that data plane traffic is restored.
  4. Clean Up Stale Ports (If Necessary):

NSX nodes crashed

Additional Information

vSphere with Tanzu Guest Cluster Network Assignment Fails with "opaque network with id not found"