NSX Edge HA Failover and BGP Adjacency Drops Due to Host Resource Contention
search cancel

NSX Edge HA Failover and BGP Adjacency Drops Due to Host Resource Contention

book

Article ID: 434080

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX Edge experiences intermittent tunnel connectivity loss, as seen in the Edge /var/log/syslog:
    2026-03-11T12:29:08.892Z edge.example.com NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel <local-vtep-ip>:<remote-vtep-ip> state changed from Up to Unreachable
    2026-03-11T12:29:08.896Z edge.example.com NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel <local-vtep-ip>:<remote-vtep-ip> state changed from Up to Unreachable

  • This loss of connectivity triggers High Availability (HA) failovers, as seen in the Edge /var/log/syslog:
    2026-03-11T12:29:08.926Z edge.example.com NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="WARN" eventId="vmwNSXClusterNodeStatus"] {"event_state":4,"event_external_reason":"Edge node status changed: Up -> Down , reason: VTEP tunnels down","event_src_comp_id":"1990####-####-####-####-#######575ac","event_sources":{"id":"1990####-####-####-####-#######575ac"}}
    2026-03-11T12:29:09.051Z edge.example.com NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="WARN" eventId="vmwNSXClusterFailoverStatus"] {"event_state":1,"event_external_reason":"Service router switches over from Active to Down. edge node down","event_src_comp_id":"1990####-####-####-####-#######575ac","event_sources":{"id":"2171####-####-####-####-########0ebe","router_id":"69db####-####-####-####-#######baff"}}

  • This causes BGP adjacencies on the edge to drop, as seen in the Edge /var/log/frr/frr.log:
    2026/03/11 12:29:10.298195 BGP: %NOTIFICATION: sent to neighbor <neighbor-ip-address> 6/2 (Cease/Administratively Shutdown) 0 bytes

  • When tunnel connectivity is restored, HA failover is triggered again as the Edge reassumes BGP adjacencies.

  • Symptoms are resolved by migrating (vMotion) the Edge VM to a different host.

Environment

VMware NSX
VMware ESXi

Cause

The NSX Edge VM is experiencing compute or network resource starvation due to severe resource contention from co-located workload VMs on the same ESXi host.

Resolution

  • Relocate the affected NSX Edge VMs to a dedicated Edge cluster or separate ESXi hosts to prevent co-location with general workload VMs.

  • If temporary co-location is unavoidable, configure strict CPU and Memory reservations for the Edge VMs to guarantee resource availability and prevent datapath starvation.

  • Validate host placement and Distributed Resource Scheduler (DRS) anti-affinity rules to ensure Edge VMs are not inadvertently migrated back to congested workload hosts.

 

Additional Information

NSX Edge VM CPU and Memory Requirements