vSAN-Health Service fails to start
search cancel

vSAN-Health Service fails to start

book

Article ID: 318854

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi VMware Site Recovery Manager 8.x

Issue/Introduction

Symptoms:

  • Attempting to submit a backup job via vCenter Server Appliance VAMI fails with error

Invalid vCenter Server Status: All required services are not up! Stopped services: 'vsan-health'.

  • Operations on vSphere Replication fails with error :

Operation Failed
A generic error occurred in the vSphere Replication Management Server. Exception details: 'Unexpected status code: 503'.

  • Certain operations on VMware Site Recovery Manager fail
  • You may find vSAN-Health Service fails to start with stack below

         service-control --start vmware-vsan-health

  • Operation not cancellable. Please wait for it to finish...
    Performing start operation on service vsan-health...
    Error executing start on service vsan-health. Details {
        "problemId": null,
        "detail": [
            {
                "args": [
                    "vsan-health"
                ],
                "id": "install.ciscommon.service.failstart",
                "translatable": "An error occurred while starting service '%(0)s'",
                "localized": "An error occurred while starting service 'vsan-health'"
            }
        ],
        "resolution": null,
        "componentKey": null
    }
    Service-control failed. Error: {
        "problemId": null,
        "detail": [
            {
                "args": [
                    "vsan-health"
                ],
                "id": "install.ciscommon.service.failstart",
                "translatable": "An error occurred while starting service '%(0)s'",
                "localized": "An error occurred while starting service 'vsan-health'"
            }
        ],
        "resolution": null,
        "componentKey": null
    }


    /var/log/vmware/vsan-health/vmware-vsan-health-runtime.log.stderr log reports an error as below.
    Starting service process with pid: 53357.
    Traceback (most recent call last):
      File "/usr/lib/vmware-vpx/vsan-health/VsanVcMgmtd.py", line 9, in
        os.initgroups(entry.pw_name, entry.pw_gid)
    PermissionError: [Errno 1] Operation not permitted


  • Every time an attempt to start the service will reflect change in PID in vmware-vsan-health-runtime.log.stderr and the service remains stopped.

Environment

VMware vCenter Server Appliance 6.7.x

VMware vSphere Replication 8.x

 

Cause

  • vsan-health service starts with root account and the same is the case with other services. 
  • It is most likely to happen if the service was disabled and upgrade succeeded to 6.7 U3.
  • Service property and json files fails to update the required attribute.

Resolution

To solve this issue, please follow the steps given below:

  1. Take a snapshot of vCenter Server Appliance.
  2. Open a SSH session to vCenter Server Appliance.
  3. Go to /etc/vmware/vmware-vmon/svcCfgfiles/
cd /etc/vmware/vmware-vmon/svcCfgfiles/
 
  1. use the below command to list hidden files in this location.
ls -la
  1. Make a copy .state_vsan-health.json outside svcCfgfiles folder
cp .state_vsan-health.json ../.state_vsan-health.json

  1. Delete /etc/vmware/vmware-vmon/svcCfgfiles/.state_vsan-health.json
rm /etc/vmware/vmware-vmon/svcCfgfiles/.state_vsan-health.json
  1. Navigate to /usr/lib/vmware-vmon
cd /usr/lib/vmware-vmon
  1. Run the command:
vmon-cli -U vsan-health -R Root
  1. Start the service using 
service-control --start vmware-vsan-health
 
Note: Remember to consolidate the snapshot taken in step 1 once the solution is verified.

Note: Ignore step 5 and 6 if the .state_vsan-health.json is not found.

Additional Information

Impact/Risks:
  • VAMI backup will fail.
  • Cannot configure a VM for vSphere Replication job.