vCenter Server VAMI restore fails at 97% during service startup due to expired certificates
search cancel

vCenter Server VAMI restore fails at 97% during service startup due to expired certificates

book

Article ID: 421819

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • Restoring vCenter Server from a VAMI backup fails during the service startup phase.
  • On vCenter Server, below events are reported :

    • The restore process stops when attempting to start the services.

    • /var/log/vmware/vmon/vmon.log file indicates a critical failure starting the certificateauthority, sps, applmgmt, and vpxd services.
      INFO: Failed to start services in profile ALL. RC=4, stderr=Failed to start certificateauthority, sps, applmgmt, vpxd, observability-vapi, topologysvc, sca, certificatemanagement, vpxd-svcs services. Error: A system error occurred. Check logs for details
      ERROR: Failed to start all the vCenter services. Error: Failed to start services in profile ALL. RC=4, stderr=Failed to start certificateauthority, sps, applmgmt, vpxd, observability-vapi, topologysvc, sca, certificatemanagement, vpxd-svcs services. Error: A system error occurred. Check logs for details
    • /var/log/vmware/applmgmt/reconciliation.log file confirms the reconciliation job failed because services could not initialize.
      ReconciliationManager::main:ReconciliationManager.py:301] ERROR: Failed to complete reconciliation.
      Error Failed to start all services.
      ERROR: Failed to dispatch event com.vmware.applmgmt.reconcil.job.failed.event Err: unidentifiable C++ exception

    • /var/log/vmware/applmgmt/restore.log contains the error:
      [RestoreManager::PostRestore:RestoreManager.py:467] ERROR: Reconciliation job failed to start. Error: error code: 1
      RestoreManager::main:RestoreManager.py:669] ERROR: RestoreManager encountered an exception: error code: 1
      [RestoreManager::HandleRestoreCleanup:RestoreManager.py:512] INFO: Restore cleanup complete.
      [Common::RestoreExitCleanup:Common.py:373] INFO: Wait 20 seconds before restore cleanup
      [Common::RestoreExitCleanup:Common.py:378] INFO: No temp network to cleanup
      [Proc::RunCmd:Proc.py:422] INFO: Executing command /usr/lib/applmgmt/dcui/notify.
      [MainProcess:PID-2730] [RestoreManager::main:RestoreManager.py:730] INFO: Restore job failed.

  • Running the vecs-cli command to audit certificate validity shows expired dates in the "Not After" field:
    for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

Cause

This issue occurs because the backup includes expired Solution User certificates from the original system. The restore process reinstates these expired certificates, preventing the services from initializing.

Resolution

The restore failure is due to expired solution user certificates. Refer to the procedure below to replace the certificates.

  1. Refer to vCert - Scripted vCenter expired certificate replacement to download the vCert utility and upload it to the vCenter server.

  2. Run the vCert tool on the vCenter appliance and select the following options to replace the solution user certificates

    • Option 3: Manage certificates → Option 2. Solution User certificates → Option 1. Replace the Solution User certificate with a VMCA-signed certificate

Note: In case multiple certificates are reported as expired, proceed with Option 6. Reset all certificates with VMCA-signed certificates

Additional Information

For detailed instructions on how to use vCert, please refer to the following KB 
vCert - Expired Certificate Replacement Script