vmware-vsan service stops abruptly and fails to start with the following error "An error occurred while starting service '%(0)s'"
search cancel

vmware-vsan service stops abruptly and fails to start with the following error "An error occurred while starting service '%(0)s'"

book

Article ID: 413676

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • The vmware-vsan service was noticed to be abruptly stopped and failed to start with the following error

    "An error occurred while starting service '%(0)s'",

  • Following error is reported in the /var/log/vmware/cloudvm/service-control.log:

    <>timestamp<> ERROR service-control Error executing start on service vsan-health. Details {
        "detail": [
            {
                "id": "install.ciscommon.service.failstart",
                "translatable": "An error occurred while starting service '%(0)s'",
                "args": [
                    "vsan-health"
                ],
                "localized": "An error occurred while starting service 'vsan-health'"
            }
        ],
        "componentKey": null,
        "problemId": null,
        "resolution": null
    }

  • /var/log/vmware/vmon/vmon.logs at reports 'HTTP Error 503: Service Unavailable'.

    <>timestamp<> In(05) host-xx Received start request for vsan-health
    <>timestamp<> Wa(03) host-xx <vsan-health> Service api-health command's stderr: ERROR:root:Got URL error HTTP Error 503: Service Unavailable
    <>timestamp<> Wa(03)+ host-xx
    <>timestamp<> In(05) host-xx <vsan-health> Re-check service health since it is still initializing.
    <>timestamp<> In(05) host-xx <vsan-health> Service start operation timed out.
    <>timestamp<> Wa(03) host-xx <vsan-health> Found empty StopSignal parameter in config file. Defaulting to SIGTERM
    <>timestamp<> Wa(03) host-xx [ReadSvcSubStartupData] No startup information from vsan-health.
    <>timestamp<> In(05) host-xx Initiating vMon shutdown.

  • /var/log/vmware/vsan-health/vsanvcmgmtd.log :

    <>timestamp<> info vsanvcmgmtd[2718185] [vSAN@6876 sub=PyCppVmomi] Loaded system certificate from VECS.
    <>timestamp<> info vsanvcmgmtd[2718185] [vSAN@6876 sub=vmomi.soapStub[72] opId=98e233f7] SOAP request returned HTTP failure; <<io_obj p:0x00007fe160002420, h:15, <TCP '127.0.0.1 : 52346'>, <TCP '127.0.0.1 : 1080'>>, /extension-login/sdk>, method: loginExtensionByCertificate; code: 503(Service Unavailable); fault: (null)
    <>timestamp<> warning vsanvcmgmtd[2718185] [vSAN@6876 sub=Py2CppStub opId=98e233f7] |- EExit LOCAL::vim.SessionManager.loginExtensionByCertificate (0 ms)
    <>timestamp<> warning vsanvcmgmtd[2718185] [vSAN@6876 sub=Py2CppStub opId=98e233f7] Exception while invoking VMOMI method 'LOCAL::vim.SessionManager.loginExtensionByCertificate': N7Vmacore4Http13HttpExceptionE(HTTP error response: Service Unavailable)
    --> [context]zKq7AVECAQAAACqAeAE+dnNhbnZjbWdtdGQAAEMcU2xpYnZtYWNvcmUuc28AAAgYQgApP0MAlplKASBCHmxpYnZtb21pLnNvAAE/ZCEB6pAhAV8KIQHX2hoCrPkEbGliUHlDcHBWbW9taS5zbwACFSYFA5S9EmxpYnB5dGhvbjMuMTAuc28uMS4wAANczxYDAroWAz70FgMCuhYD/hUXAwK6FgP+FRcDAroWA1zPFgMCuhYDXM8WAwK6FgNczxYDAroWA8CNEgNYlxIDtSAVA/sMFQN8jxID9xsXAwK6FgMPohID/hUXAwK6FgPAjRIDWJcSA7UgFQP7DBUDfI8SA/cbFwMCuhYDtsoWAwK6FgMPohID/hUXAwK6FgMPohID/hUXAwK6FgPALh4DSB0eA8RwFAMBLBcDAroWA/4VFwMCuhYDXM8WAwK6FgO2yhYDAroW[/context]
    (END)

  • /var/log/vmware/vpxd-svcs/vpxd-svcs.log:

    <>timestamp<> [refresh-lotus-locator-task [] INFO  com.vmware.cis.lotus.LotusLocator  opId=] vmAfClient.getDomainControllerEx("") : vcsa.vmware.com
    <>timestamp<> [refresh-lotus-locator-task [] INFO  com.vmware.cis.lotus.LotusLocator  opId=] Lotus hostname URL : vcsa.vmware.com
    <>timestamp<> [refresh-lotus-locator-task [] INFO  com.vmware.cis.lotus.LotusLocator  opId=] vmAfClient.getDomainName() in baseDn format : dc=vsphere,dc=local
    <>timestamp<> [refresh-lotus-locator-task [] INFO  com.vmware.cis.lotus.LotusLocator  opId=] Successfully refreshed machine account credentials
    <>timestamp<> [dataservice-0 [] WARN  com.vmware.cis.authorization.impl.AclPrivilegeValidator  opId=39e2744d-3c6e-422a-92b0-202ea77df50b IS] User VSPHERE.LOCAL\com.vmware.vr-sa-xxxx-xxx-xx-xx-xx does not have privileges [System.Read] on object urn%3Avmomi%3AInventoryServiceTag%3A64a67b0c-1013-44c4-833d-3a92a0a7a792%3AGLOBAL
    <>timestamp<> [dataservice-0 [] ERROR com.vmware.vim.vmomi.server.impl.InvocationTask  opId=39e2744d-3c6e-422a-92b0-xxxxIS] Method invocation threw unexpected exception!
    com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): envoy overloaded
            at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpUtil.validateHttpResponse(ApacheHttpUtil.java:101) ~[vapi-runtime.jar:?]
            at com.vmware.vapi.internal.protocol.client.rpc.http.HttpClient.invoke(HttpClient.java:174) ~[vapi-runtime.jar:?]
            at com.vmware.vapi.internal.protocol.client.rpc.http.HttpClient.send(HttpClient.java:187) ~[vapi-runtime.jar:?]

  • Run the below command to confirm the count shows a value more than 0 in  /var/log/vmware/envoy

    └─$ zgrep "503 overload"  envoy-access-* | wc -l
    99828

Environment

VMware vCenter 8.x

Cause

Memory exhaustion in the envoy-sidecar can impact the vCenter service and triggering 503 service errors.

Resolution

  • This is similar to the issue reported in KB Workload Management tab fails to load and is resolved in vCenter Server 8.0 U3g
  • As a workaround vCenter would be rebooted as well, and the vmware-vsan service would start successfully. However, there could be changes that we might hit this issue again.