NSX-T Manager upgrade failed to 3.1.x
search cancel

NSX-T Manager upgrade failed to 3.1.x

book

Article ID: 324390

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The Manager upgrade to NSX-T 3.1.x is failing.
  • The Manager upgrade logs (/var/log/resume-upgrade.log) indicate the following:
2021-08-05 14:49:14,003 - Resuming paused playbook /var/vmware/nsx/file-store/VMware-NSX-manager-3.1.3.0.0.18329005-playbook.yml
2021-08-05 14:49:14,004 - Validating playbook /var/vmware/nsx/file-store/VMware-NSX-manager-3.1.3.0.0.18329005-playbook.yml
2021-08-05 14:49:14,044 - Running "run_migration_tool" (step 7 of 14) 2021-08-05 14:52:59,586 - Running "run_datastore_compactor" (step 8 of 14)
2021-08-05 14:57:10,347 - Running "start_manager" (step 9 of 14)
2021-08-05 15:27:19,959 - Playbook failed at step start_manager. Run the command 'set debug-mode' followed by 'start upgrade-bundle VMware-NSX-unified-appliance-3.1.3.0.0.18329005 step get_upgrade_task_history' for more info.
{
"state": 2,
"state_text": "CMD_ERROR",
"info": "[MUS] UpgradeError: Playbook failed at step start_manager. Run the command 'set debug-mode' followed by 'start upgrade-bundle VMware-NSX-unified-appliance-3.1.3.0.0.18329005 step get_upgrade_task_history' for more info.",
body": null
}
  • The step 9 is failing in timeout after 1 hour as the Manager service is not starting.
  • Resuming the upgrade doesn't fix the issue.
  • In the Manager syslog (/var/log/syslog), the message can be seen multiple time:
2021-08-05 17:44:14: Waiting for management cluster to become stable wait_for_proton: resp_status: 200, body: { \"application_status\" : \"FAILED\", \"application_status_details\" : \"Error creating bean with name 'switchingConfiguration': Unsatisfied dependency expressed through method 'setVcFullSyncer' parameter 0; nested exception is
...
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'clusterCertificateService': Invocation of init method failed; nested exception is com.vmware.nsx.management.truststore.exceptions.InvalidDataException: Certificate chain validation failed. Make sure a valid chain is provided in order leaf,intermediate,root certificate.\",



Environment

VMware NSX-T Data Center

Cause

As highlighted by the message, the certificate validation is failing: 
Certificate chain validation failed. Make sure a valid chain is provided in order leaf,intermediate,root certificate.

Prior 3.1.x, there was no certificate chain validation.

Resolution

  1. Contact Broadcom Support to get assistance with the Manager rollback.
  2. Once the Manager cluster is UP and Running. Navigate to the System tab and review the chains of all the certificates.
  3. Replace the Certificate which are not matching the chain: leaf, intermediate, root certificate.
  4. Apply the new Certificate to the Manager (cluster) service: 
curl -k -u 'admin:<admin-pwd>' -X POST "https://<nsx-mgr-ip>/api/v1/cluster/api-certificate?action=set_cluster_certificate&certificate_id=<certificate-id>"