The purpose of this KB is to fix this upgrade error on the SDDC Configuration drift bundle, and get the SDDC Manager completely upgraded.
Symptoms:
Application of the Configuration drift bundle update for the SDDC Manager on VCF 4.X fails with the error:
Failed to run clean VUM DB.
On the SDDC Manager logs
/var/log/vmware/lcm/lcm.log:
Note: This log will give you detail on upgrade id. Using this upgradeId, we can check the thirdparty/migration logs. As seen in screen shot the path for logs and upgrade id.
In sddcmanager_migration_upgrade.log located in path /var/log/vmware/vcf/lcm/thirdparty/upgrades/<upgrade id>/sddcmanager-migration-app/logs/sddcmaanger_migration_upgrade.log
2022-09-14T03:01:24.713+0000 ERROR [vcf_migration,0000000000000000,0000] [c.v.e.s.o.model.error.ErrorFactory,pool-5-thread-15] [6469SC] FAILED_TO_RUN_CLEAN_VUM_DB Failed to run clean VUM DB. com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Failed to run clean VUM DB. at com.vmware.vcf.migration.actions.workarounds.CleanVumDBAction.execute(CleanVumDBAction.java:315) at com.vmware.vcf.migration.actions.workarounds.CleanVumDBAction.execute(CleanVumDBAction.java:40) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:159) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:144) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:400) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:520) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:124) at sun.reflect.GeneratedMethodAccessor598.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:87) at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:72) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.vmware.vapi.std.errors.ServiceUnavailable: ServiceUnavailable (com.vmware.vapi.std.errors.service_unavailable) => { messages = [LocalizableMessage (com.vmware.vapi.std.localizable_message) => { id = com.vmware.vapi.endpoint.cis.ServiceUnavailable, defaultMessage = Service unavailable., args = [], params = <null>, localized = <null>
On the Management vCenter server,
applmgmt.log in /var/log/vmware/applmgmt:
2022-08-25T21:02:22 PM CEST [2205]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token Traceback (most recent call last): File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 243, in authenticate username = token.username File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 486, in username return self.get_name_id().value File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 939, in get_name_id '//saml2:Subject/saml2:NameID', self.reference) File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 477, in reference self.validate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 1169, in validate reference = super(HolderOfKeyToken, self).validate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 505, in validate signing_chain = self.validate_certificate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 685, in validate_certificate 'One or more certificates cannot be verified.') vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified. 2022-08-25T21:02:22 PM CEST [2205]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token Traceback (most recent call last): File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 243, in authenticate username = token.username File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 486, in username return self.get_name_id().value File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 939, in get_name_id '//saml2:Subject/saml2:NameID', self.reference) File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 477, in reference self.validate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 1169, in validate reference = super(HolderOfKeyToken, self).validate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 505, in validate signing_chain = self.validate_certificate() File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 685, in validate_certificate 'One or more certificates cannot be verified.') vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.
As a part of the Configuration drift bundle, SDDC manager performs an operation on the VUM DB to clean up some entries via an API call.
To make that API/SDK connection, SDDC manager has to leverage an API on the vCenter that passes through the applmgmt service.
The applmgmt service on the vCenter was failing due to an issue with the STS certificate in the VMDIR DB - likely due to unexpected entries in the STS certificate.
0. Take offline snapshots of all vCenters.
1. Reset the STS certificate on the Management vCenter (https://kb.vmware.com/s/article/76719)
2. Reset the solution user certificates on the Management vCenter using Option 6 of the certificate manager (https://kb.vmware.com/s/article/2112283)
That should restart all services on the vCenter as well.
3. Restart services on all remaining vCenters in the SSO.
(We can use the command : service-control --stop --all && service-control --start --all)
4. At this point if the services on any of the vCenters fail to start, reset the solution user certificates on that vCenter as well (See Step 2)
5. Attempt the Configuration Drift Bundle update again from the SDDC Manager UI.
MODERATE: The process involves resetting the STS certificate, which is a change in the VMDIR DB. Solution user certificates on one or more vCenters may also be reset. It is required to take offline snapshots of all vCenters in the SSO. Do not proceed without offline snapshots of all vCenters.