NSX Manager upgrade failing with error "Unexpected error while upgrading upgrade unit".
search cancel

NSX Manager upgrade failing with error "Unexpected error while upgrading upgrade unit".

book

Article ID: 400165

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX upgrade from 4.1.x to 4.2.x.
  • NSX Manager upgrade Failed on the Node OS Upgrade upgrade stage (1%).
  • Download of NUB (Node Upgrade Bundle) failed on NSX Manager node/UA which resulted in the NSX Manager upgrade to fail with below error.

    Error:- "Unexpected error while upgrading upgrade unit: [MPP] Node upgrade failed : Download and verify bundle failed with msg: Closing connection 0.".



  • Log lines similar to the below are seen on the NSX Manager in /var/log/upgrade-coordinator/upgrade-coordinator.log
    NSX 2701320 - [nsx@6876 comp="nsx-manager" subcomp="curl wrapper" username="ua" level="INFO"] certificate verification ############################################################e915 from <manager_ip_address/FQDN>:443 failed: SSL: no alternative certificate subject name matches target host name '<manager_ip_address/FQDN>'

    NSX 2701320 - [nsx@6876 comp="nsx-manager" subcomp="curl wrapper" username="ua" level="INFO"] Closing connection 0

    NSX 2701320 - [nsx@6876 comp="nsx-manager" subcomp="curl_wrapper" username="ua" level="INFO"] /opt/vmware/nsx-common/python/nsx_utils/curl wrapper exit code 51

    NSX 1180 - [nsx@6876 comp="nsx-manager" subcomp="upgrade-agent" tid="1362" level="ERROR" errorCode="MPA50007"] Error downloading nub 'https://<manager_ip_address/FQDN>/repository/4.2.1.3.0.24533884/Manager/nub/VMware-NSX-unified-appliance-4.2.1.3.0.24533887.nub', output msg: , error msg: * Trying (with httplib) <manager_ip_address/FQDN>:443 ... #012* certificate verification ############################################################e915 from <manager_ip_address/FQDN>:443 failed: SSL: no alternative certificate subject name matches target host name '<manager_ip_address/FQDN>'#012* Closing connection 0#012curl wrapper: (51) SSL: no alternative certificate subject name matches target host name '<manager_ip_address/FQDN>'#012

    NSX 1180 - [nsx@6876 comp="nsx-manager" subcomp="upgrade-agent" tid="1362" level="ERROR" errorCode="MPA50006"] Error preparing upgrade

    NSX 1180 - [nsx@6876 comp="nsx-manager" subcomp="upgrade-agent" tid="1362" level="INFO"] [SendMsg] Sending message (type : com. vmware. nsx. upgrade_agent . PrepareUpgradeResponseMsg, len:388)

    NSX 1754 - [nsx@6876 comp="nsx-manager" subcomp="node-mgmt" username="root" level="WARNING"] Failed to check DNS entries for VIP with error reason: Traceback (most recent call last) :#012 File "/opt/vmware/nsx-node-api/bin/python/management_api/napi/root/alarms/manager_health_event.py", line 194, in dual_stack_ missing_dns_entry_vip_callback#012 if ipv4_fqdn name and ipv6_fqdn name: #012UnboundLocalError: local variable 'ipv6_fqdn name' referenced before assignment
    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX

Cause

This can be encountered when using a CA Signed Certificate where not all NSX Manager cluster VIP, NSX Manager Node's FQDN and IP are present in the SAN field of the Manager's REST API certificate.

Resolution

This is a condition that may occur in a VMware NSX environment.

Below are the workarounds to fix the CA Signed Certificate/DNS issue on the NSX Manager Node which is preventing NSX Manager node upgrade.

Workaround 1

  • Resolve the Certificate issue by modifying the CA Signed Certificate by making sure all the NSX Manager VIP's and all the NSX Manager node's FQDN and IP Or wildcard entries are added to the Certificate Subject Alternative Name, re-import and apply the certificate to NSX Manager VIP and Nodes.
 
Workaround 2

Workaround 3 [we may see the same error if the fqdn resolution has issue]

 
Once the Certificate or DNS Resolution is fixed follow the below steps depending on current status of the NSX Manager nodes 

  • If any of the NSX Manager Node in the cluster were upgraded then rollback the NSX Manager to the previous version then re-initiate the upgrade.
    https://techdocs.broadcom.com/us/en/vmware-cis/nsx/vmware-nsx/4-2/upgrade-guide/upgrading-nsxt/upgrading-management-plane/upgrade-management-plane-from-nsx-3-2-1-x-and-later.html
  • If none of the NSX Manager Node upgrade was started/completed and if any of NSX Manager are in Maintenance mode
      Get the maintenance mode status of the Manager in the cluster by running the below command or API
     get group maintenance-mode status
     GET https://<manager_fqdn/IP>/api/v1/cluster-manager/group-maintenance-mode
  • If any of the NSX Manager are in maintenance mode (Maintenance mode ON) then run the below API against the manager to get the system in stable state
     POST api/v1/cluster-manager/nodes/########-####-####-############?action=maintenance_mode_off
      Keep calling GET https://<manager_fqdn/IP>/api/v1/cluster-manager/group-maintenance-mode until all entries show as MAINTENANCE_MODE_OFF
 
Once all the NSX Manager are stable and are not in maintenance mode then retry the failed upgrade from the NSX UI.

If you have encountered this issue and need assistance with roll back of the NSX Manager upgrade, please open a support request with Broadcom Support and refer to this KB article.