VCF upgrade fails on "SDDC Manager Deployment Drift stage" with an error "Service Validation Failed"
search cancel

VCF upgrade fails on "SDDC Manager Deployment Drift stage" with an error "Service Validation Failed"

book

Article ID: 385592

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation 5.x

Issue/Introduction

  • SDDC update from 5.2.1.0 to 5.2.1.1 or 5.2.1.2 fails on "SDDC Manager Deployment Drift" with an error message as shown below

 

  • As part of SDDC update workflow, auto reboot for SDDC manager is initiated. /var/log/vmware/vcf/commonsvcs/commonsvcs.log report, in this example, failure to start 'domain manager' service
    YYYY-MM-DD HH:MIN INFO [common,0000000000000000,0000] [com.zaxxer.hikari.HikariDataSource,SpringApplicationShutdownHook] HikariPool-1 - Shutdown initiated...
    YYYY-MM-DD HH:MIN INFO [common,0000000000000000,0000] [com.zaxxer.hikari.HikariDataSource,SpringApplicationShutdownHook] HikariPool-1 - Shutdown completed.
    YYYY-MM-DD HH:MIN INFO [common,0000000000000000,0000] [c.v.e.s.c.c.l.l.LogLayoutWithHeader,main] THIS LOG FILE IS MANAGED BY SDDC MANAGER
    YYYY-MM-DD HH:MIN INFO [common,0000000000000000,0000] [o.h.validator.internal.util.Version,background-preinit] HV000001: Hibernate Validator 8.0.0.Final
    YYYY-MM-DD HH:MIN INFO [common,0000000000000000,0000] [c.v.e.s.c.c.u.ComponentUpgradeRunner,main] Checking if a component upgrade is needed
    YYYY-MM-DD HH:MIN INFO [common,0000000000000000,0000] [c.v.e.s.i.s.VcfServiceInventoryServiceImpl,main] Get all VcfServices
    YYYY-MM-DD HH:MIN ERROR [common,0000000000000000,0000] [c.v.e.s.i.s.VcfServiceInventoryServiceImpl,main] Error while trying to retrieve service http://127.0.0.1/domainmanager/about status, 502 Bad Gateway: "<html><EOL><EOL><head><title>502 Bad Gateway</title></head><EOL><EOL><body><EOL><EOL><center><h1>502 Bad Gateway</h1></center><EOL><EOL><hr><center>nginx</center><EOL><EOL></body><EOL><EOL></html><EOL><EOL>"
  • /var/log/vmware/vcf/domainmanager/domainmanager.log reports service failure with an error, 'Failed to update VCF Services and Photon rpms in SDDC Manager' post reboot
    YYYY-MM-DD HH:MIN: INFO: Updated /var/log/vmware/vcf/lcm/thirdparty/upgrades/a5##-##-##-##-##760/vcf-platform/upgrade/vcf_platform_upgrade.status status file with data OrderedDict([('upgradeId', 'a5##-##-##-##-##760'), ('resourceId', 'ac3##-##-##-##-##bca'), ('upgradeStatusCode', 'INPROGRESS'), ('progress', 70), (' error', OrderedDict([('errorCode', None), ('errorDescription', None)])), ('startTime', 1736376825), ('endTime', 1736377186)])
    YYYY-MM-DD HH:MIN: INFO: Rebooting SDDC Manager
    YYYY-MM-DD HH:MIN: INFO: Execute cmd: sh -x /var/log/vmware/vcf/lcm/thirdparty/bundles/d5##-##-##-##-##600/thirdparty/reboot_script.sh &
    YYYY-MM-DD HH:HH:MIN: INFO: http://localhost/domainmanager/about is not accessible, retry after 10 seconds
    YYYY-MM-DD HH:HH:MIN: INFO: URL: http://localhost/domainmanager/about
    YYYY-MM-DD HH:HH:MIN: ERROR: RC: , OUT: ERR: Expecting value: line 1 column 1 (char 0)
    YYYY-MM-DD HH:HH:MIN: ERROR: Failed to update VCF Services and Photon rpms in SDDC Manager
    YYYY-MM-DD HH:HH:MIN: INFO:
    YYYY-MM-DD HH:HH:MIN: INFO: RC: 1, OUT:
    YYYY-MM-DD HH:HH:MIN: INFO: ERR: Traceback (most recent call last):
     File "/var/log/vmware/vcf/lcm/thirdparty/bundles/d5##-##-##-##-##600/thirdparty/vcf-platform-upgrade/bin/vcf_platform_upgrade.py.copy", line 521, in <module>
     wrapper.update_status(return_code=1, status='COMPLETED_WITH_FAILURE',
     File "/var/log/vmware/vcf/lcm/thirdparty/bundles/d5##-##-##-##-##600/thirdparty/vcf-platform-upgrade/bin/../../wrapper.py", line 187, in update_status
     raise Exception

     

  • Similar to Domain manager, operations manager service also fails to auto start. Error in /var/log/vmware/vcf/operationsmanager/operationsmanager.log
    YYYY-MM-DD HH:MIN INFO [vcf_om,0000000000000000,0000] [com.zaxxer.hikari.HikariDataSource,SpringApplicationShutdownHook] HikariPool-1 - Shutdown initiated...
    YYYY-MM-DD HH:MIN DEBUG [vcf_om,677ef1873a72dd65052e80371dfed741,4fa6] [c.v.v.p.v.u.ValidateCredentialsTranslationTaskExecutor,om-exec-1] Exception occurred during validate credentials translation task : Error creating bean with name 'liquibase': Singleton bean creation not allowed while singletons of this factory are in destruction (Do not request a bean from a BeanFactory in a destroy method implementation!)
    org.springframework.beans.factory.BeanCreationNotAllowedException: Error creating bean with name 'liquibase': Singleton bean creation not allowed while singletons of this factory are in destruction (Do not request a bean from a BeanFactory in a destroy method implementation!)
     at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:220)
     at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:324)
     at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:200)
     at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:313)
     at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:200)
     at org.springframework.beans.factory.support.DefaultListableBeanFactory$1.orderedStream(DefaultListableBeanFactory.java:471)
     at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.detectPersistenceExceptionTranslators(PersistenceExceptionTranslationInterceptor.java:167)
     at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:149)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
     at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:135)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
     at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
     at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:244)
     at jdk.proxy2/jdk.proxy2.$Proxy295.findByStatusOrderByCreationTimeAsc(Unknown Source)
     at com.vmware.vcf.passwordmanager.validation.utils.ValidateCredentialsTranslationTaskExecutor$1.call(ValidateCredentialsTranslationTaskExecutor.java:53)
     at com.vmware.vcf.passwordmanager.validation.utils.ValidateCredentialsTranslationTaskExecutor$1.call(ValidateCredentialsTranslationTaskExecutor.java:47)
     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
     at com.vmware.vcf.common.tracing.TraceRunnable.run(TraceRunnable.java:59)
     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
     at java.base/java.lang.Thread.run(Thread.java:840)

 

  • Permission related errors can be found in <service name>.out file. Once such example is operations manager - /var/log/vmware/vcf/operationsmanager/operationsmanager.out
    Caused by: java.io.FileNotFoundException: /etc/vmware/vcf/operationsmanager/application.properties (Permission denied)
     at java.base/java.io.FileInputStream.open0(Native Method)
     at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
     at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
     at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111)
     at java.base/sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:86)
     at java.base/sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:189)
     at org.springframework.core.io.UrlResource.getInputStream(UrlResource.java:231)
     at org.springframework.boot.origin.OriginTrackedResource.getInputStream(OriginTrackedResource.java:61)
     at org.springframework.boot.env.OriginTrackedPropertiesLoader$CharacterReader.<init>(OriginTrackedPropertiesLoader.java:205)
     at org.springframework.boot.env.OriginTrackedPropertiesLoader.load(OriginTrackedPropertiesLoader.java:80)
     at org.springframework.boot.env.OriginTrackedPropertiesLoader.load(OriginTrackedPropertiesLoader.java:66)
     at org.springframework.boot.env.PropertiesPropertySourceLoader.loadProperties(PropertiesPropertySourceLoader.java:70)
     at org.springframework.boot.env.PropertiesPropertySourceLoader.load(PropertiesPropertySourceLoader.java:49)
     at org.springframework.boot.context.config.StandardConfigDataLoader.load(StandardConfigDataLoader.java:54)
     at org.springframework.boot.context.config.StandardConfigDataLoader.load(StandardConfigDataLoader.java:36)
     at org.springframework.boot.context.config.ConfigDataLoaders.load(ConfigDataLoaders.java:96)
     at org.springframework.boot.context.config.ConfigDataImporter.load(ConfigDataImporter.java:132)
     at org.springframework.boot.context.config.ConfigDataImporter.resolveAndLoad(ConfigDataImporter.java:87)
     ... 29 common frames omitted
     

 

  • You can also confirm R/W permissions against services by listing the files
    # systemctl status domainmanager
    * domainmanager.service - VMware Cloud Foundation Domain Manager
     Loaded: loaded (/etc/systemd/system/domainmanager.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since YYYY-MM-DD HH:MIN UTC; ##s ago
     Main PID: ## (code=exited, status=1/FAILURE)
    
    # systemctl status operationsmanager
    * operationsmanager.service - VMware Cloud Foundation Operations Manager
     Loaded: loaded (/etc/systemd/system/operationsmanager.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since YYYY-MM-DD HH:MIN UTC; ##s ago
     Main PID: ## (code=exited, status=1/FAILURE)

Environment

VMware Cloud Foundation 5.2.x

Cause

The ownership of /etc/vmware/vcf/domainmanager/application.properties file and /etc/vmware/vcf/operationsmanager/application.properties file is set to vcf_sos:vcf

Resolution

Resolution:

This behavior has been identified as a known issue within the current version of the product. Engineering team is aware and developing a permanent fix. A resolution will be included in a future product release. This article will be updated with more information upon availability of the fix. Meanwhile, use the documented workaround for mitigation.

Workaround:

Revert the Snapshot of the SDDC Manager VM that should have been taken prior to the upgrade attempt. If there is no snapshot prior to upgrade attempt then the following steps will not work.

  1. SSH to SDDC manager with vcf and su to root
  2. Change ownership back to the service users using below commands:

    chown vcf_domainmanager:vcf /etc/vmware/vcf/domainmanager/application.properties
    chown vcf_operationsmanager:vcf /etc/vmware/vcf/operationsmanager/application.properties 
  3. Validate the ownership is changed

    ls -lrt /etc/vmware/vcf/domainmanager
    ls -lrt /etc/vmware/vcf/operationsmanager
  4. Restart the services

    systemctl restart domainmanager
    systemctl restart operationsmanager

  5. To prevent similar occurrences, cleanup any stale backup folders left behind by failed backup operations

    rm -rf /var/log/vmware/vcf/sddc-support/backup-<tab for folder name>