Edge HA Member Status down Alarm
search cancel

Edge HA Member Status down Alarm

book

Article ID: 373590

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title : Alarm for Edge HA Member Status Down
Event ID: edge_health.edge_ha_member_status_down
Alarm Description

  • Purpose: To inform users that the edge node is set to down . The reasons could be various such as some of the following which causes the edge node to be down.
    • Datapath not running
    • Overlay connectivity issue
    • Host switch configuration issue
    • Bootup health pre-check failed
    • Edge in read-only mode, etc.

          The reason is included in the alarm.

  • Impact: 

    In an Active/Active configuration, the SRs will continue to be running on the other healthy edge nodes. In an Active/Standby setup, there will be no impact if the edge cluster has a healthy edge node member that continues servicing the SRs.

Environment

  • VMware NSX
  • VCF 9.0 and above.

Resolution

Steps to Resolve
For VCF 9.0 and higher

Recommended Action

Resolution varies based on different down reasons

  1. "Bootup Precheck Failed":
    • This can be due to one of followings:
      • "Controller Mgmt Vertical Down"/"Edge Vertical Down"/"Switching Vertical Down":
        • Check managers connectivity via CLI "get managers".
        • "restart service nsx-opsagent" to restart opsagent.
        • Ensure the edge is correctly joined to the manager.
        • Otherwise, check events from syslog further to see why MP connectivity is down. 
      • "CCP Session Down":
        • Restart controller service from managers.
        • Otherwise, troubleshoot from the manager to see if any CCP session is pushed to the edge. 
      • "NestDB Not Ready":
        • Invoke CLI "restart service nestdb" to restart NestDB service.
      • "NSD Not Ready":
        • Toggle maintenance mode to see if this down reason is resolved.
        • Otherwise, check events from syslog further to see why NSD service is unhealthy.
  2. "Read-only File System":
    • Reboot the edge after resolving the storage issue.
  3. "HA not in Active state":
    • Can be various reasons. Invoke CLI "get edge-cluster history state" to see the transition history and the last state of edge.
    • Ensure correctness of edge-cluster configuration.
    • Check events from syslog further if required to see why edge is not transitioned to active.
  4. "Datapath not connected":
    • Invoke CLI "get service dataplane" to see running status.
    • "restart service dataplane" to restart the dataplane. 
  5. "No overlay host switch configured"/"VTEP device down"/"VTEP IP not configured":
    • For VM edge, ensure the vnic for VTEP is connected as well as the VM network setting. For bare metal edge, verify physical connection to the NIC is secure.
    • Invoke CLI "get host-switches" to ensure correctness of host-switches configuration
  6. "RTEP device down":
    • For VM edge, ensure the vnic for RTEP is connected as well as the VM network setting. For bare metal edge, verify physical connection to the NIC is secure.
    • Invoke CLI "get host-switches" to ensure correctness of host-switches configuration
  7.  "VTEP tunnels down": 
    • Troubleshoot overlay connectivity issue further if host configuration is correctly applied

Maintenance windows required for remediation?
No. Edge is already in Down state.