NSX Edge nodes display "DOWN" status after physical NIC replacement on ESXi host.
search cancel

NSX Edge nodes display "DOWN" status after physical NIC replacement on ESXi host.

book

Article ID: 423183

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

After replacing a NIC on an ESXi host dedicated to NSX Edge nodes, several Edge nodes (e.g., 3 out of 4) transitioned to a "DOWN" status.
While the physical NICs are reported as "UP" at the ESXi, the NSX Manager indicates a tunnel status failure.

Symptoms:

  • Specific Edge nodes remain UP while others on the same host are DOWN.
  • Status description: "Status DOWN caused by [tunnel], please check sub-status fields."
  • TEP traffic shows transmitted packets but zero received packets.

Environment

VMware NSX

Cause

The primary cause was a physical cabling error following the hardware replacement.
Diagnostic analysis revealed:

  • ARP Failure: Affected Edge nodes were unable to resolve the ARP for their tunnel peers (other Edge nodes or ESXi TEPs).
  • Packet Discontinuity: Packet captures confirmed that ARP requests were sent out of the ESXi vmnic, but were never received by the destination peer's vmnic.
  • Network Mismatch: Investigation of the traffic on the swapped NIC showed unexpected packets, indicating the NIC was connected to a different LAN instead of the intended Overlay/Transport VLAN.

Resolution

To resolve this issue, verify and correct the physical network topology:

  • Identify Uplink Mapping: Confirm which vmnic is associated with VDS used for Edge TEP traffic.
  • Physical Cable Verification: Ensure the network cables are connected to the correct physical switch ports configured with the appropriate VLANs.
  • Hardware Compatibility: Ensure the new NIC and its driver/firmware version are listed on the VMware Compatibility Guide for the specific ESXi version in use.
    COMPATIBILITY GUIDE