[NSX] Steps to perform when NSX Edge goes unresponsive
book
Article ID: 377076
calendar_today
Updated On:
Products
VMware NSX Networking
Issue/Introduction
The purpose of this article is to inform the user what steps to take or data to collect if their NSX Edge nodes seems unresponsive.
Environment
NSX
Cause
Edge nodes can become unresponsive due to a variety of factors including but not limited to CPU/MEM contention, memory leaks, underlying storage issues and/or physical network issues.
Resolution
If you encounter a situation where you believe an Edge node has stopped responding - please collect the below data when opening a support request with Broadcom support:
Ping the Management IP of the edge.
Ping from NSX-MGR to edge. What is the result?
Login to the edge if possible – is it responsive to SSH?
If no ssh access, is it accessible via console ?
Once logged in via console/ssh run the below commands and record the result
get managers
get-controllers
get edge-cluster status
get bfd-sessions
get bfd-sessions stats
Thru vCenter UI - is the edge management network adapter in a connected state?
What is status of edge node via NSX-MGR UI ( System → Fabric → Nodes)
Run API : GET /api/v1/transport-nodes/<tn-uuid>/state
curl -v -k -H "Content-Type:application/json" -u admin -X GET 'https://<Edge-MGMT-IP>/api/v1/transport-nodes/<tn-uuid>/state"
Gather the following log-bundles:
NSX-MGR nodes.
edge node impacted.
ESXi host servicing the edge at present and when issue first occurred (if edge was vMotioned).