Unexpected error while upgrading upgrade unit. Failed to exit node <UUID> from maintenance mode. Please retry the operation.
"get cluster status
shows Group Type: MONITORING
as STATUS DOWN
/var/log/phonehome-coordinator/phonehome-coordinator-tomcat-wrapper.log
:INFO | jvm 5 | Unable to create /image/core/phc_oom.hprof: File exists
INFO | jvm 5 |
Terminating due to java.lang.OutOfMemoryError: Java heap spaceSTATUS | wrapper |
The JVM has run out of memory. Requesting thread dump.STATUS | wrapper | Dumping JVM state.
ERROR | wrapper |
JVM exited unexpectedly.STATUS | wrapper | JVM process is gone.
STATUS | wrapper | Launching a JVM.
/image/core/phc_oom.hprof
You get similar output as below when you run "get group maintenance-mode status":
Group Type: MANAGER
Members:
UUID Leadership Work Completed Group Update Ack Received Maintenance Mode Status
########-####-####-####-############ False True MAINTENANCE_MODE_FAILED
########-####-####-####-############
True True MAINTENANCE_MODE_OFF
########-####-####-####-############
True True MAINTENANCE_MODE_OFFGroup Type: ASYNC_REPLICATOR
Members:
UUID Leadership Work Completed Group Update Ack Received Maintenance Mode Status
########-####-####-####-############
False True MAINTENANCE_MODE_FAILED
########-####-####-####-############
True True MAINTENANCE_MODE_OFF
########-####-####-####-############
True True MAINTENANCE_MODE_OFF
Note: The above logs excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
VMware NSX
The issue happens as a result of a race condition causes the Phonehome-coordinator (Monitoring) service fails to start due to "out of memory" issue and upgrade cannot be continued. The phonehome-coordinator service crashes and won't start because of out of a memory issue during initializing time.
Below is a workaround:
For more information, please check NSX Manager VM and Host Transport Node System Requirements
Note: This issue is fixed in NSX 4.2.1.1 and later.
For additional information see Troubleshooting NSX Manager Upgrade Failures.