"Failed to exit node <Manager UUID> from maintenance mode. Please retry the operation"
VMware NSX
VMware NSX-T Data Center
nsxmgr1> get group maintenance-mode status
Group Type: <name of the service>Members: UUID Leadership Work Completed Group Update Ack Received Maintenance Mode Status<Manager 1 UUID> True False MAINTENANCE_MODE_FAILED<Manager 2 UUID> False False MAINTENANCE_MODE_OFF<Manager 3 UUID> True True MAINTENANCE_MODE_OFF
Note: Command "get group maintenance-mode status" needs to be entered manually as this command would not auto-complete.
Workaround 1
SSH into all three manager nodes as root user,
/etc/init.d/nsx-ccp restartget group maintenance-mode status"get group maintenance-mode status" shows "True" for all the parameters, Go to step (8)POST https://<nsx-mgr>/api/v1/cluster-manager/nodes/<nsx-mgr1-uuid>?action=maintenance_mode_offPOST https://<nsx-mgr>/api/v1/cluster-manager/nodes/<nsx-mgr2-uuid>?action=maintenance_mode_offPOST https://<nsx-mgr>/api/v1/cluster-manager/nodes/<nsx-mgr3-uuid>?action=maintenance_mode_off
get group maintenance-mode status"get group maintenance-mode status" shows "True" for all the parameters, Go to step (8)/etc/init.d/nsx-ccp restartget group maintenance-mode status"get group maintenance-mode status" shows "True" for all the parameters, Go to step (8)Workaround 2
If the manager node is stuck in the reboot then we can see the below log messages in var/log/syslog,
---snip---
reboot.target: Job reboot.target/start timed out.<Date>T<Time>Z <hostname> systemd 1 - - Timed out starting Reboot.<Date>T<Time>Z <hostname> systemd 1 - - reboot.target: Job reboot.target/start failed with result
--snip---
In such scenarios, rebooting the manager manually and perform the above workaround 1.
Resolution:
This is a known issue impacting VMware NSX.
In case you see the error as "Management Plane node failed to enter maintenance mode" Please refer below KB.
During NSX upgrade the Management Plane node failed to enter maintenance mode