Backup job failures in the SDDC environment were triggered due to the unsuccessful rotation of service account passwords
search cancel

Backup job failures in the SDDC environment were triggered due to the unsuccessful rotation of service account passwords

book

Article ID: 410750

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

  • Backup jobs in the SDDC environment may fail due to the unsuccessful rotation of the vcenter server service account passwords
  • In the SDDC Manager UI, under Management/Workload DomainCertificate vCenter Server → Status, the certificate status appears as "Certificate Installation Failed".

    • The certificate installation completes successfully on the vCenter Server.
    • The Operations Manager database records the installation status as FAILED

Cause

  • The mismatch between the certificate state and the status recorded in the Operations Manager database.

    This behavior can be confirmed through entries in the  operationmanager.log showing failed or incomplete states for the affected resource.
  • /var/log/vmware/vcf/operationmanager/operationmanager.log
    DEBUG [vcf_om,68bf6e92d7d76cb641######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Update operation started asynchronously
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Current stage UNKNOWN
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Successfully obtained old credentials of <VC FQDN>
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Is service account : VCENTER
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Setting old and service credentials for account with entityId: ####-###-###-###-######, entityName: <VC FQDN>, credentialType: SSO, username: <user@domain-name>
    
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.helper.CredentialHelper,om-exec-8] Fetching credentials for entityId==###-###-###-###-####;credentialType==SSO;username==<user@domain-name>;entityType==VCENTER
    
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.helper.CredentialHelper,om-exec-8] Size from Credentials query API 1
     DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.e.s.c.s.s.ServiceCredentialsHelper,om-exec-8] Getting credentials for target type VCENTER, entity ID ###-###-###-###-####and service type SDDC_MANAGER
    WARN  [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.c.f.p.n.s.s.CredentialMgmtServiceImpl,om-exec-8] Use non-service SSO credentials for vCenter ID ####-####-####-####-#####
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.c.f.p.n.s.s.CredentialMgmtServiceImpl,om-exec-8] Retrieve credentials for id #####-####-####-####-####entityType PSC credentialType SSO
    INFO  [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Population of old and service credentials completed successfully for entity with entityId: #####-###-###-####-#####, entityName: <VC FQDN>, credentialType: SSO, username:<user@domain-name>
    
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] About to do checkBeforeRun...
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.helper.ResourceStateHelper,om-exec-8] Getting status for type VCENTER, id ####-###-###-###-#####
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.helper.ResourceStateHelper,om-exec-8] Status of Resource ####-###-###-###-#####: ERROR
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.helper.ResourceStateHelper,om-exec-8] Status for type VCENTER, id ####-###-####-####-####is true
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.helper.ResourceStateHelper,om-exec-8] Resources are in ERROR state
    ERROR [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Resource <VC FQDN> is not available/ready.
    com.vmware.vcf.passwordmanager.exception.PasswordUpdateException: Resource ##-##-##.####.<VC FQDN>l is not available/ready.
            at com.vmware.vcf.passwordmanager.update.changers.AbstractPasswordChanger.updateAsync(AbstractPasswordChanger.java:636)
            at com.vmware.vcf.passwordmanager.update.changers.AbstractPasswordChanger.doUpdate(AbstractPasswordChanger.java:201)
            at com.vmware.vcf.passwordmanager.rotate.AbstractPasswordTransactionExecutor$1.call(AbstractPasswordTransactionExecutor.java:100)
            at com.vmware.vcf.passwordmanager.rotate.AbstractPasswordTransactionExecutor$1.call(AbstractPasswordTransactionExecutor.java:88)
            at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            at com.vmware.vcf.common.tracing.TraceRunnable.run(TraceRunnable.java:59)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
            at java.base/java.lang.Thread.run(Thread.java:840)
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] Error Message : Resource <VC FQDN> is not available/ready., Error Token : ######, Error Cause : {}
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.u.c.AbstractPasswordChanger,om-exec-8] About to mark resource state as error...
    DEBUG [vcf_om,68bf6e92d7d76cb641#######eeb7c,b581] [c.v.v.p.r.AbstractPasswordTransactionExecutor,om-exec-8] Password operation failed for <user@domain-name>
    

Resolution

To resolve the issue, follow the steps outlined below:

  1. Create a Snapshot
        Before making any changes, take a non-memory snapshot of the SDDC Manager to ensure you can restore the system if needed.

  2. Access SDDC Manager via SSH
        Establish an SSH session to the SDDC Manager using the vcf user. Once connected, elevate privileges to root by running:

        su

  3. Identify Failed Certificate Replacement Tasks
        Run the following command to identify any failed certificate replacement tasks:

        /usr/pgsql/13/bin/psql -U postgres -h localhost -d operationsmanager -c "SELECT replacement_status_id, replacement_status, resource_fqdn FROM certificatemanagement.replacement_status WHERE replacement_status='FAILED';"

  4. Update the Status of Failed Tasks
        Using the replacement_status_id obtained in the previous step, update the task status to SUCCESSFUL by executing:

       /usr/pgsql/13/bin/psql -U postgres -h localhost -d operationsmanager -c "UPDATE certificatemanagement.replacement_status SET replacement_status='SUCCESSFUL' WHERE replacement_status_id=<ID>;"

    Note: Replace <ID> with the actual task ID.

  5. Resolve GUI Errors (if applicable)
        If there are UI errors due to failed workflows, identify them using the domain name:

        /usr/pgsql/13/bin/psql -U postgres -h localhost -d operationsmanager -c "SELECT workflow_id, operation_type, operation_status, start_time FROM certificatemanagement.certificate_operation WHERE domain_name='<Domain_Name>';"

  6.     Update the status of the failed operation(s) to SUCCESSFUL using the corresponding workflow_id:

        /usr/pgsql/13/bin/psql -U postgres -h localhost -d operationsmanager -c "UPDATE certificatemanagement.certificate_operation SET operation_status='SUCCESSFUL' WHERE workflow_id='<WORKFLOWID>';"

  7. Remove Stale Certificate Entries   
     
    • List certificates by expiry date: 
         SELECT id, server_cert_id, issued_to, expiry_date FROM certificatemanagement.certificate_chain_expiry ORDER BY expiry_date
        
    • Delete any stale entries using their respective ID: 
              DELETE FROM certificatemanagement.certificate_chain_expiry WHERE id=<stale_cert_id>;

  8.  Restart SDDC Manager Services (If Required)
        To ensure all changes are properly applied, restart the SDDC Manager services if necessary.
        /opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh

Additional Information

If the issue is related to VxRail service accounts, coordination with Dell support will be required