[NSX] Steps to perform when NSX Edge goes unresponsive

search cancel

[NSX] Steps to perform when NSX Edge goes unresponsive

book

Article ID: 377076

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

The purpose of this article is to inform the user what steps to take or data to collect if their NSX Edge nodes seems unresponsive.

Environment

NSX

Cause

Edge nodes can become unresponsive due to a variety of factors including but not limited to CPU/MEM contention, memory leaks, underlying storage issues and/or physical network issues.

Resolution

If you encounter a situation where you believe an Edge node has stopped responding - please collect the below data when opening a support request with Broadcom support:

Ping the Management IP of the edge.
- Ping from NSX-MGR to edge. What is the result?
Login to the edge if possible – is it responsive to SSH?
- If no ssh access, is it accessible via console ?
Once logged in via console/ssh run the below commands and record the result
- get managers
- get-controllers
- get edge-cluster status
- get bfd-sessions
- get bfd-sessions stats
Thru vCenter UI - is the edge management network adapter in a connected state?
What is status of edge node via NSX-MGR UI ( System → Fabric → Nodes)

Run API : GET /api/v1/transport-nodes/<tn-uuid>/state

curl -v -k -H "Content-Type:application/json" -u admin -X GET 'https://<Edge-MGMT-IP>/api/v1/transport-nodes/<tn-uuid>/state"

Gather the following log-bundles:
- NSX-MGR nodes.
- edge node impacted.
- ESXi host servicing the edge at present and when issue first occurred (if edge was vMotioned).

Feedback

thumb_up Yes

thumb_down No