vCenter Server UI unavailable with "No healthy upstream" error and it complains about missing key (__MACHINE_CERT) in STS_INTERNAL_SSL_CERT store.
search cancel

vCenter Server UI unavailable with "No healthy upstream" error and it complains about missing key (__MACHINE_CERT) in STS_INTERNAL_SSL_CERT store.

book

Article ID: 422821

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • When attempting to access the vCenter Server User Interface (UI), the browser displays the error:
    • 503 Service Unavailable (No healthy upstream)
  • When we checked the service health, we see multiple services are failing to start. 
    • Running:
      applmgmt lookupsvc lwsmd observability observability-vapi vc-wsla-broker vmafdd vmcad vmdird vmware-cis-license vmware-eam vmware-envoy vmware-envoy-hgw vmware-envoy-sidecar vmware-infraprofile vmware-pod vmware-postgres-archiver vmware-rhttpproxy vmware-trustmanagement vmware-vapi-endpoint vmware-vdtc vmware-vmon vmware-vpostgres vtsdb
      Stopped:
      pschealth vlcm vmcam vmonapi vmware-analytics vmware-certificateauthority vmware-certificatemanagement vmware-content-library vmware-hvc vmware-imagebuilder vmware-netdumper vmware-perfcharts vmware-rbd-watchdog vmware-sca vmware-sps vmware-stsd vmware-topologysvc vmware-updatemgr vmware-vcha vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-ui vstats wcp
  • While starting the vmware-stsd service, it crashed immediately.
    • From /var/log/vmware-vmon/vmon.log 
      YYYY-MM-DDTHH:MM:SSZ In(05) host-##### Received start request for sts
      YYYY-MM-DDTHH:MM:SSZ In(05) host-##### <sts-prestart> Constructed command: /usr/bin/python /usr/lib/vmidentity/install/sts-prestart-script.py /var/log/vmware/sso/sts-prestart.log
      YYYY-MM-DDTHH:MM:SSZ In(05) host-##### <sts> Service pre-start command completed successfully.
      YYYY-MM-DDTHH:MM:SSZ In(05) host-##### <sts> Constructed command: /usr/lib/vmidentity/install/sts-start-script.sh
      YYYY-MM-DDTHH:MM:SSZ Wa(03) host-##### <sts> Service exited. Exit code 1
  • Reviewed the /var/log/vmware/sso/tomcat/catalina.log 
    • YYYY-MM-DDTHH:MM:SSZ INFO org. apache.coyote.httpll.HttpllNioProtocol Initializing ProtocolHandler ["https-Vecs Aware JSSE-nio-127.0.0.1-7444"]
      YYYY-MM-DDTHH:MM:SSZ SEVE org. apache. catalina.startup.Bootstrap Error running command
      java.lang.Error: org. apache. catalina. LifecycleException: Protocol handler initialization failed
                   at org. apache.catalina.startup.Catalina. load(Catalina. java: 689)
                   at sun.reflect.NativeMethodAccessorImpl. invoke0 (Native Method)
                   at sun. reflect.NativeMethodAccessorImpl. invoke (NativeMethodAccessorImpl. java: 62)
                   at sun. reflect. DelegatingMethodAccessor Impl. invoke (DelegatingMethodAccessor Impl. java: 43)
                   at java. lang.reflect.Method. invoke (Method. java: 498)
                   at org.apache.catalina. startup. Bootstrap. load (Bootstrap. java: 302)
                   at org. apache. catalina.startup. Bootstrap.main (Bootstrap.java:475)
      Caused by: org. apache. catalina. LifecycleException: Protocol handler initialization failed
                   at org. apache. catalina. connector.Connector. initInternal (Connector. java: 1011)
                   at org. apache.catalina.util.LifecycleBase.init (LifecycleBase. java: 127)
                   at org. apache. catalina.core. StandardService.initInternal (StandardService. java:554)
                   at org.apache.catalina. util.LifecycleBase.init (LifecycleBase.java:127)
                   at org. apache. catalina. core. StandardServer. initInternal (StandardServer. java: 1046)
                   at org.apache. catalina. util. LifecycleBase. init (LifecycleBase. java: 127)
                   at org. apache. catalina. startup.Catalina.load(Catalina. java: 686)
                   ... 6 more
      Caused by: java. lang. IllegalArgumentException: Could not get key with alias __MACHINE_CERT from VECS key store
                   at org.apache. tomcat.util.net.AbstractJsseEndpoint. createSSLContext (AbstractJsseEndpoint . java: 109)
                                at org. apache. tomcat. util.net.AbstractJsseEndpoint. initialiseSsl (AbstractJsseEndpoint. java: 71)
                   at org.apache.tomcat.util.net.NioEndpoint.bind (NioEndpoint.java:236)
                   at org.apache. tomcat.util.net.AbstractEndpoint.bindWithCleanup (AbstractEndpoint. java: 1334)
                   at org. apache. tomcat.util.net.AbstractEndpoint. init (AbstractEndpoint. java: 1347)
                   at org.apache. coyote.AbstractProtocol. init (AbstractProtocol. java: 654)
                   at org.apache.coyote.httpll.AbstractHttpllProtocol. init (AbstractHttpllProtocol. java: 75)
                   at org. apache.catalina. connector. Connector. initInternal (Connector. java: 1009)
                   ... 12 more
      Caused by: java. io. IOException: Could not get key with alias __MACHINE CERT from VECS key store
                   at com. vmware. identity. tomcat. VECSAwareSSLImplementation. getTransientKeyStore (VECSAwareSSLImplementation. java: 162)
                   at com. vmware. identity. tomcat. VECSAwareSSLImplementation$1. getKeyManagers (VECSAwareSSLImplementation. java: 65)
                   at org. apache.tomcat.util.net.SSLUtilBase.createSSLContext (SSLUtilBase.java:268)
                   at org. apache. tomcat.util.net.AbstractJsseEndpoint. createSSLContext (AbstractJsseEndpoint. java: 107)
                   ... 19 more

Environment

  • VMware vCenter Server Appliance 7.X
  • VMware vCenter Server Appliance 8.X

Cause

  • When the vCenter was upgraded from 5.5, it retains lookup service certificate in STS_INTERNAL_SSL_CERT store which will be used by this url https://FQDN:7444/lookupservice/sdk.
  • Now this certificate store is no longer used in the vCenter. 
  • Ran the below command to list all the certificates and validate the "STS_INTERNAL_SSL_CERT". 
    • for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done
  • Snippets:
    • STORE TRUSTED ROOT CRLS
      Alias : 5907b##########ed08932ee
      Alias : ec701##########03d84bf798
      
      STORE STS_INTERNAL_SSL_CERT
      
      STORE machine
      Alias : machine
      Not After : Jan 19 02:03:21 2035 GMT

Resolution

To resolve this issue, the stale or corrupted STS_INTERNAL_SSL_CERT store must be removed from the VMware Endpoint Certificate Store (VECS). This forces the service to re-initialize correctly or allows for a clean reconfiguration. 

Note: After the services have restarted, attempt to log in to the vSphere Client. If the "No healthy upstream" error persists or the vmware-stsd service still fails to start, please collect a log bundle (vc-support) and contact Broadcom Technical Support for further assistance.