Tier-0 Gateway reports status "DOWN" instead of "DEGRADED" during partial Edge Node failure
search cancel

Tier-0 Gateway reports status "DOWN" instead of "DEGRADED" during partial Edge Node failure

book

Article ID: 426254

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Failing one NSX Edge Node results in the Tier-0 Gateway status displaying as "DOWN" in the NSX Manager UI.



  • Additionally, a "Routing down" alarm is triggered indicating "No northbound connection".



  • The NSX data plane remains functional. Traffic continues to flow successfully through the remaining healthy NSX Edge Node. The expected status for the Tier-0 Gateway in this scenario is "DEGRADED" (indicating partial redundancy), not "DOWN" (indicating total outage).

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 9.x
VMware NSX 4.x
VMware NSX-T Data Center 3.x

Cause

This issue is cosmetic and due to a reporting defect in the NSX Manager UI aggregation logic. It affects the monitoring dashboard and alarms only. The actual network forwarding (NSX Data Plane) functions correctly via the remaining active NSX Edge node.

The TEP and BGP sessions on one Edge node go down completely while the Management plane connectivity to that Edge node remains active, the system incorrectly weighs the failure of the specific Edge node's routing components against the entire Gateway status. It fails to correctly aggregate the healthy status of the second node into a "DEGRADED" overall status and the system incorrectly interprets the loss of redundancy as a total loss of connectivity.

Resolution

This issue is resolved in VCF 9.0.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Additional Information

To confirm the actual state of the environment, you can verify the status via CLI on the present NSX Edge node with Active status.

  1. SSH the active NSX Edge Node as admin.

  2. Verify BGP status:

    get route bgp neighbor
    

    Ensure state is Established.

  3. Verify TEP status:

    get tunnel-ports
    

    Ensure tunnels are Up.

If the CLI confirms the second NSX Edge node is healthy, the UI status of "DOWN" can be disregarded as a false positive.