NSX Manager node upgrade failed at step "Entering maintenance mode".
search cancel

NSX Manager node upgrade failed at step "Entering maintenance mode".

book

Article ID: 383715

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX Upgrade UI shows: Failed to put node ########-####-####-####-############ in maintenance mode. Please retry the operation after checking 'get group maintenance-mode status' CLI.
  • NSX Manager node failed to enter maintenance mode. Log lines similar to the below are encountered in /var/log/upgrade-coordinator/upgrade-coordinator.log

NSX SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Checking if using old workflow - mpNodesGroup Group [id=MPNodesGroup, name=Node OS Upgrade, upgradeMethod=SERIAL, upgradeUnits=[UpgradeUnit [id=########-####-####-####-############, TransportNodeID =null, name=#########, description=#######, type=MP, upgradeUnitSubtype=RESOURCE, currentVersion=#.#.#.#.########, warnings=[], errors=[{"moduleName":"upgrade-coordinator","errorCode":30062,"errorMessage":"Unexpected error while upgrading upgrade unit: Failed to put node ########-####-####-####-############ in maintenance mode. Please retry the operation after checking 'get group maintenance-mode status' CLI."}], metaData={manager_ip=##.##.##.##, UU_TYPE=NODE}, rebooting=false, UaReportedStatus=NOT_SYNCED, extendedConfiguration=[KeyValuePair [key=MP_UPGRADE_WORKFLOW, value =ROLLING, class=class com.vmware.nsx.management.upgrade.model.KeyValuePair, hashCode=#######]], progressTracker=UpgradeUnitProgressCollectorImpl [reference=######-[...]

  • Node failed to enter maintenance mode since group update ack was not sent by CCP. Log lines similar to the below are encountered in /var/log/syslog

NSX - [nsx@6876 comp="nsx-manager" level="INFO" s2comp="maintenance-mode-helper" subcomp="ccp"] Did not receive group membership update ack or leadership work is not completed for member ########-####-####-####-############ in group ########-####-####-####-############ in 12 minutes.
NSX - [nsx@6876 comp="nsx-manager" level="INFO" s2comp="maintenance-mode-helper" subcomp="ccp"] Updating maintenance mode status to MAINTENANCE_MODE_FAILED in GroupMaintenanceMode for memberId ########-####-####-####-############, groupId ########-####-####-####-############

  • Running SSH get group maintenance-mode status on the affected NSX Manager via CLI shows that the Maintenance Mode Status is at MAINTENANCE_MODE_FAILED and the Group Update Ack Received is at False.

nsxmgr1> get group maintenance-mode status
Group Type: CONTROLLER
Members:
    UUID                   Leadership Work Completed              Group Update Ack Received              Maintenance Mode Status
    <Manager 1 UUID>       True                                   False                                  MAINTENANCE_MODE_FAILED
    <Manager 2 UUID>       True                                   True                                   MAINTENANCE_MODE_OFF
    <Manager 3 UUID>       True                                   True                                   MAINTENANCE_MODE_OFF

 

Note: The command get group maintenance-mode status needs to be entered manually as this command would not auto complete. 

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

This is a known issue impacting NSX version below 4.1.1

Cause

This is caused when the CCP (Central Control Plan) messages are not received in correct time.

Resolution

This issue is resolved in VMware NSX 4.1.1 available at Broadcom Downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

As a workaround:

  1. Repair the Maintenance mode on the affected appliance:
    • Restart the controller service on all the NSX managers (via root CLI): /etc/init.d/nsx-ccp restart  
    • Or reboot the affected NSX Manager node.
  2. Retry the upgrade.