Unexpected error while upgrading upgrade unit. Failed to exit node <UUID> from maintenance mode. Please retry the operation.
"get cluster status
" shows all services as being UPget group maintenance-mode status
" will show output similar to the following:Group Type: MONITORING
Members:
UUID Leadership Work Completed Group Update Ack Received Maintenance Mode Status
########-####-####-####-############ False True MAINTENANCE_MODE_FAILED
########-####-####-####-############
True True MAINTENANCE_MODE_OFF
########-####-####-####-############
True True MAINTENANCE_MODE_OFF
/var/log/cbm/tanuki.log
on the NSX manager node with FAILED status:INFO | jvm 27 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper | The JVM has run out of memory. Requesting thread dump.
STATUS | wrapper | Dumping JVM state.
STATUS | wrapper | The JVM has run out of memory. Restarting JVM.
INFO | jvm 27 | Dumping heap to /image/core/cbm_oom.hprof ...
/image/core/cbm_oom.hprof
VMware NSX
The issue happens as a result of a CBM memory error when the phonehome-coordinator service is coming up. This causes the nodes to not pickup "ShardingMaster" leadership, resulting in NSX manager being in this condition.
This issue is resolved in VMware NSX 9.0.0 available at Broadcom Downloads.
If you are having difficulty finding and downloading software, please review the KB article Download Broadcom products and software.
If you believe you have encountered this issue and you require a workaround, please open a support request with Broadcom Support and refer to this KB article.