Edge HA Member Status down Alarm

Products

VMware NSX

Issue/Introduction

Title : Alarm for Edge HA Member Status Down
Event ID: edge_health.edge_ha_member_status_down
Alarm Description

Purpose: To inform users that the edge node is set to down . The reasons could be various such as some of the following which causes the edge node to be down.
- Datapath not running
- Overlay connectivity issue
- Host switch configuration issue
- Bootup health pre-check failed
- Edge in read-only mode, etc.

The reason is included in the alarm.

Impact:
In an Active/Active configuration, the SRs will continue to be running on the other healthy edge nodes. In an Active/Standby setup, there will be no impact if the edge cluster has a healthy edge node member that continues servicing the SRs.

Environment

VMware NSX
VCF 9.0 and above.

Resolution

Steps to Resolve
For VCF 9.0 and higher

Recommended Action :

Resolution varies based on different down reasons

"Bootup Precheck Failed":
- This can be due to one of followings:
  - "Controller Mgmt Vertical Down"/"Edge Vertical Down"/"Switching Vertical Down":
    - Check managers connectivity via CLI "get managers".
    - "restart service nsx-opsagent" to restart opsagent.
    - Ensure the edge is correctly joined to the manager.
    - Otherwise, check events from syslog further to see why MP connectivity is down.
  - "CCP Session Down":
    - Restart controller service from managers.
    - Otherwise, troubleshoot from the manager to see if any CCP session is pushed to the edge.
  - "NestDB Not Ready":
    - Invoke CLI "restart service nestdb" to restart NestDB service.
  - "NSD Not Ready":
    - Toggle maintenance mode to see if this down reason is resolved.
    - Otherwise, check events from syslog further to see why NSD service is unhealthy.
"Read-only File System":
- Reboot the edge after resolving the storage issue.
"HA not in Active state":
- Can be various reasons. Invoke CLI "get edge-cluster history state" to see the transition history and the last state of edge.
- Ensure correctness of edge-cluster configuration.
- Check events from syslog further if required to see why edge is not transitioned to active.
"Datapath not connected":
- Invoke CLI "get service dataplane" to see running status.
- "restart service dataplane" to restart the dataplane.
"No overlay host switch configured"/"VTEP device down"/"VTEP IP not configured":
- For VM edge, ensure the vnic for VTEP is connected as well as the VM network setting. For bare metal edge, verify physical connection to the NIC is secure.
- Invoke CLI "get host-switches" to ensure correctness of host-switches configuration
"RTEP device down":
- For VM edge, ensure the vnic for RTEP is connected as well as the VM network setting. For bare metal edge, verify physical connection to the NIC is secure.
- Invoke CLI "get host-switches" to ensure correctness of host-switches configuration
"VTEP tunnels down":
- Troubleshoot overlay connectivity issue further if host configuration is correctly applied

Maintenance windows required for remediation?
No. Edge is already in Down state.

Edge HA Member Status down Alarm

Article ID: 373590

Updated On:

Products

Issue/Introduction

Environment

Resolution

Feedback