Troubleshooting NSX Edge Node Down Issues
search cancel

Troubleshooting NSX Edge Node Down Issues

book

Article ID: 422576

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

An Edge's High-Availability (HA) state should be in Active to be considered functioning. If an Edge's HA state is not Active, no service can be up or running as Active on that Edge node.

Edge HA state can be queried using CLI or API as below.


CLI

nsxedge> get edge-cluster status
High Availability State     : Inactive
                  Since     : 2025-10-01T17:25:34.39
Edge Node Id                : 87ab8f7e-####-####-a0ef-0200142d10a9
Edge Node Status            : Down
Edge Node Down Reason       : VTEP device down
Admin State                 : Up
Vtep State                  : Down
Configuration               : applied
Health Check Config         :
    Interval                : 1000 msec
    Deadtime                : 3000 msec
    Max Hops                : 1
Service Status              :
    Datapath Config Channel : Up
    Datapath Status Channel : Up
    Routing Status Channel  : Up
    Routing Status          : Up
Peer Status                 :
    Node Id                 : 1f009b8c-9a33-####-####-005056a19a2d
    Node Thumbprint         : 51:44:BE:5C:##:##:##:##:##:2C:58:D1:74:55:76:14:D6:CA:6F:FF:3E:36:5C:BC:95:36:5C:C6:D8:86:5D:CF
    Node Status             : Up (Routing Down)
    Healthcheck Sessions    :
        Interface           : eth0
        Session             : 192.###.144.19:192.163.###.74
        Status              : Concat Path Down

        Interface           : vtep-0
        Device              : fp-eth0
        Session             : 192.###.13.127:192.###.21.132
        Status              : Concat Path Down


If the Edge was Down before but is recovered, state history can tell what was the event/reason causing the Edge node down.

nsxedge> get edge-cluster history state
State       : Disabled
Time        : 2025-10-01T17:08:13.04
Event       : Init
Reason      : Init

State       : Offline
Time        : 2025-10-01T17:08:13.04
Event       : Config Updated
Reason      : Config Updated

State       : Discover
Time        : 2025-10-01T17:08:28.47
Event       : Datapath Connected
Reason      : DP Connected

State       : StateSync
Time        : 2025-10-01T17:08:31.40
Event       : BFD State Updated
Reason      : Updated

State       : Inactive
Time        : 2025-10-01T17:08:31.40
Event       : State Sync Completed
Reason      : Updated

State       : Active
Time        : 2025-10-01T17:10:27.55
Event       : Bootup Precheck Passed
Reason      : Bootup Precheck Passed

State       : Inactive
Time        : 2025-10-01T17:25:34.39
Event       : Node State Changed
Reason      : Device Down


API

curl -k -u <user>:<passwd> -X GET https://<ua-mgmt-ip>/api/v1/transport-nodes/<edge-node-uuid>/status
{
  "node_uuid" : "87ab8f7e-####-####-a0ef-0200142d10a9",
  "node_display_name" : "kf009135-nsxedge-ob-#######-1-T0",
  "status" : "DOWN",
  ......
  "status_description" : "Status DOWN caused by [high availability status], please check sub-status fields.",
  "last_aggsvc_heartbeat" : 1759338713698,
  "last_status_changed_time" : 1759338697911,
  "vtep_state" : "UP",
  "storage_state" : "READ_WRITE",
  "high_availability_status" : "DOWN",
  ......
}


NSX API can only query the overall HA status. CLI is needed to get the reason why Edge node is down.

Environment

VMware NSX

Cause

Edge HA can be one of the following states:

  • Disabled
    • The Edge is not added to an edge-cluster yet
    • The Edge is not connected to CCP. Please check CCP connectivity.
  • Offline
    • Datapath process is not up running.
  • Discover
    • The Edge is still in the discovery phase to find all ESXs/Edges in the NSX domain. This is a transit state, and should be transition to SyncState within 1~2 minutes at most. This usually happens only when Edge just booted up or datapath service just restarted.
  • StateSync
    • The Edge is doing HA state sync with other Edges in the same edge-cluster. This is also a transit state. Once the state sync is completed, it will transit to Inactive state.
  • Inactive
    • The Edge node is down. If the edge stays in this state and does not transit to Active state, please check the next section for reasons of Edge node down.
  • Active
    • The Edge is up running.

Resolution

Reasons of Edge Node Down

Please refer to KB Edge HA Member Status down Alarm for troubleshooting issues that do not have separate KB listed above.