All NSX manager nodes are down in the cluster after issues with storage/vSAN partitioning
book
Article ID: 393964
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
After encountering storage issue, NSX manager VMs may not be working properly. The manager nodes may show error messages from the VM consoles related to services not starting.
After rebooting the manager nodes, the error message may indicate a corrupted file system.
An example console message is "Failed to start default target: Transaction for nsx-custom.target/start is destructive (emergency.target has 'start' job queued, but 'stop' is included in transaction)."
Environment
VMware NSX
VMware NSX-T Datacenter
Cause
Environmental issue caused NSX Manager VM file system corruption.
Resolution
There are several ways to restore NSX manager cluster depending on if there is still at least one manager node remains working.
Scenario 1: At least one NSX manager nodes are still running, NSX UI is available when connecting to the node's IP/FQDN
Access the NSX UI and go to System, Appliances.
Locate the problem NSX Manager node, then initiate the deletion.
Once the problem node is removed successfully, a new node can be deploy from the same page.
Scenario 1a: At least one NSX manager node are still running, however, NSX UI is unavailable on any manager nodes
If there is only one NSX Manager showing as unavailable
If file system repair does not make at least one manager node reboot successfully to login shell, deploy a brand new NSX manager using OVA file and restore backup from the SFTP server.
Scenario 3: All NSX manager nodes are not booting to login shell and no valid backup was made and available for the NSX managers
Please note that Broadcom Support do not have a way to restore your NSX environment if no valid backup is available.
Please raise a Support Request with Broadcom Support to assist attempts of recovering the manager nodes.