All NSX manager nodes are down in the cluster after issues with storage/vSAN partitioning
search cancel

All NSX manager nodes are down in the cluster after issues with storage/vSAN partitioning

book

Article ID: 393964

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • After encountering storage issue, NSX manager VMs may not be working properly. The manager nodes may show error messages from the VM consoles related to services not starting.
  • After rebooting the manager nodes, the error message may indicate a corrupted file system. 
  • An example console message is "Failed to start default target: Transaction for nsx-custom.target/start is destructive (emergency.target has 'start' job queued, but 'stop' is included in transaction)."

Environment

VMware NSX

VMware NSX-T Datacenter

Cause

Environmental issue caused NSX Manager VM file system corruption.

Resolution

There are several ways to restore NSX manager cluster depending on if there is still at least one manager node remains working.

Scenario 1: At least one NSX manager nodes are still running, NSX UI is available when connecting to the node's IP/FQDN

  • Access the NSX UI and go to System, Appliances. 
  • Locate the problem NSX Manager node, then initiate the deletion.
  • Once the problem node is removed successfully, a new node can be deploy from the same page.

Scenario 1a: At least one NSX manager node are still running, however, NSX UI is unavailable on any manager nodes

Scenario 2: All NSX manager nodes are not booting to login shell and showing storage corruption

Scenario 3: All NSX manager nodes are not booting to login shell and no valid backup was made and available for the NSX managers 

  • Please note that Broadcom Support do not have a way to restore your NSX environment if no valid backup is available.
  • Please raise a Support Request with Broadcom Support to assist attempts of recovering the manager nodes.