NSX Edges not functioning after complete storage loss
search cancel

NSX Edges not functioning after complete storage loss

book

Article ID: 417941

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The environment has experienced a complete storage loss, which has now been resolved but the NSX Edge VMs do not seem to boot up properly and report the following:

  • [37.#####] audit: kauditd hold queue overflow
  • Management channel on ####### to Transport Node ######## (#.#.#.#) is down for # minutes. 
  • Edge service status changed. The service dispatcher changed from STARTED to STOPPED. 

Environment

VMware NSX 4.x

Cause

Any abrupt loss of storage within the environment can cause a few possible issues:

  • OS corruption
  • Booting in read-only mode
  • TX queue overflow alarm to be reported 

The cause for the Edge no longer working properly can be very difficult to discover as this will depend on 

  1. How the storage was lost
  2. How long the storage was unavailable
  3. What information the OS is providing via error messages such as corruption errors. 

Resolution

Record configurations on the Edges:

  1. What type of transport zones are present
  2. Networking configurations (what distributed switch the VM is using as well as what NICs are in use by the VM)
  • Verify that port 1234 between the Edge VMs to the NSX Managers is still open and accessible
  • Perform a reboot of the affected NSX Edge VMs, one at a time
  • Review and implement NSX-T Edge file system becomes Read-Only 

If none of the above options resolve the issue, please re-deploy the NSX Edge VMs. For reference on this process please refer to Redeploy an NSX Edge Node.

Please note that VMware support recommends performing the re-deployment step as this allows for a fresh VM that will be free of any hidden issues that may have been caused by the storage loss.