"Unexpected error while upgrading upgrade unit. Failed to exit node <UUID> from maintenance mode" while upgrading NSX Manager
search cancel

"Unexpected error while upgrading upgrade unit. Failed to exit node <UUID> from maintenance mode" while upgrading NSX Manager

book

Article ID: 406870

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX management node upgrade fails with an error: "Unexpected error while upgrading upgrade unit. Failed to exit node <UUID> from maintenance mode. Please retry the operation."
  • Logged into a Manager appliance as admin, running CLI command "get cluster status" shows all services as being UP
  • From same admin login, running CLI command "get group maintenance-mode status" will show output similar to the following:
Group Type: MONITORING
Members:
  UUID                                       Leadership Work Completed              Group Update Ack Received          Maintenance Mode Status
    ########-####-####-####-############       False                                 True                                MAINTENANCE_MODE_FAILED
    ########-####-####-####-############       True                                   True                                   MAINTENANCE_MODE_OFF
    ########-####-####-####-############       True                                   True                                   MAINTENANCE_MODE_OFF
  • Log lines similar to the below are encountered in /var/log/cbm/tanuki.log on the NSX manager node with FAILED status:
INFO   | jvm 27   | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper  | The JVM has run out of memory.  Requesting thread dump.
STATUS | wrapper  | Dumping JVM state.
STATUS | wrapper  | The JVM has run out of memory.  Restarting JVM.
INFO   | jvm 27   | Dumping heap to /image/core/cbm_oom.hprof ...
 
  • A core dump file is present under path /image/core/cbm_oom.hprof

Environment

VMware NSX

Cause

The issue happens as a result of a CBM memory error when the phonehome-coordinator service is coming up. This causes the nodes to not pickup "ShardingMaster" leadership, resulting in NSX manager being in this condition.

Resolution

This issue is resolved in VMware NSX 9.0.0 available at Broadcom Downloads.
If you are having difficulty finding and downloading software, please review the KB article Download Broadcom products and software.

If you believe you have encountered this issue and you require a workaround, please open a support request with Broadcom Support and refer to this KB article.