"Exception occurred in postInstallHook for B2B-patching" error when updating vCenter Server 8.0U3 to 8.0U3e
search cancel

"Exception occurred in postInstallHook for B2B-patching" error when updating vCenter Server 8.0U3 to 8.0U3e

book

Article ID: 404316

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

  • vCenter Server update from 8.0U3 to 8.0U3e version fails with error:
    "Exception occurred in postInstallHook for B2B-patching. Please check the logs for more details. Take corrective action and then resume".

  • The vsan-health service will be in stopped state and unable to start on newly deployed VCSA appliance.

  • In /var/log/vmware/applmgmt/PatchRunner.log

YYYY-MM-DDTHH:MM:SSZ INFO service_manager The service vmware-vapi-endpoint has been already started
YYYY-MM-DDTHH:MM:SSZ INFO service_manager Getting starttype of service vmware-vsan-health
YYYY-MM-DDTHH:MM:SSZ INFO service_manager Executing command '[['/usr/sbin/vmon-cli', '--status', 'vsan-health']]'
YYYY-MM-DDTHH:MM:SSZ INFO service_manager Command '[['/usr/sbin/vmon-cli', '--status', 'vsan-health']]' has exit-code='0' and stdout: Name: vsan-health
Starttype: AUTOMATIC
RunState: STOPPED
RunAsUser: vsan-health
CurrentRunStateDuration(ms): 223510
HealthState: UNHEALTHY
FailStop: FALSE
MainProcessId: N/A

       AND

YYYY-MM-DDTHH:MM:SSZ ERROR vmware_b2b.patching.phases.patcher Patch hook Patch got unhandled exception.
Traceback (most recent call last):
  File "/storage/seat/software-updatefpci6cm7/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 208, in patch
    _patchComponents(ctx, userData, statusAggregator.reportingQueue)
  File "/storage/seat/software-updatefpci6cm7/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 89, in _patchComponents
    _startDependentServices(c)
  File "/storage/seat/software-updatefpci6cm7/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 56, in _startDependentServices
    serviceManager.start(depService)
  File "/storage/seat/software-updatefpci6cm7/stage/scripts/patches/libs/sdk/service_manager.py", line 909, in wrapper
    return getattr(controller, attr)(*args, **kwargs)
  File "/storage/seat/software-updatefpci6cm7/stage/scripts/patches/libs/sdk/service_manager.py", line 799, in start
    super(VMwareServiceController, self).start(serviceName)
  File "/storage/seat/software-updatefpci6cm7/stage/scripts/patches/libs/sdk/service_manager.py", line 665, in start
    raise IllegalServiceOperation(errorText)
service_manager.IllegalServiceOperation: Service cannot be started. Error: Error executing start on service vsan-health. Details {
    "detail": [
        {
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "args": [
                "vsan-health"
            ],
            "localized": "An error occurred while starting service 'vsan-health'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}
Service-control failed. Error: {
    "detail": [
        {
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "args": [
                "vsan-health"
            ],
            "localized": "An error occurred while starting service 'vsan-health'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}

  • The failure can also generate a crash dump under /var/core/ directory with name: core.vsanvcmgmtd-wor.####

  • In /var/log/vmware/vmon/vmon.log

YYYY-MM-DDTHH:MM:SSZ In(05) host-#### Received start request for vsan-health
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health-prestart> Constructed command: /usr/bin/python /usr/lib/vmware-vsan/scripts/vsanvcmgmtd-prestart.pyc
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health> Service pre-start command completed successfully.
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health> Constructed command: /usr/sbin/vsanvcmgmtd -s -c /usr/lib/vmware-vsan/VsanVcMgmtConfig.xml -u /etc/vmware-vsan-health/VsanMgmtCustomizedConfig.xml
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health> Running the API Health command as user vsan-health
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health-healthcmd> Constructed command: /usr/bin/python /usr/lib/vmware-vpx/vsan-health/vsanhealth-vmon-apihealth.py
YYYY-MM-DDTHH:MM:SSZ Wa(03) host-#### <vsan-health> Service api-health command's stderr: ERROR:root:Got URL error HTTP Error 503: Service Unavailable

YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health> Re-check service health since it is still initializing.
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### Client info Uid=0,Gid=0,Pid=174544,Comm=(vmon-coredumper),PPid=2,Comm=(kthreadd),PPid=0
YYYY-MM-DDTHH:MM:SSZ In(05) host-#### <vsan-health> Service is dumping core. Coredump count 0. CurrReq: 1
YYYY-MM-DDTHH:MM:SSZ Wa(03) host-#### [ReadSvcSubStartupData] No startup information from vsan-health.
YYYY-MM-DDTHH:MM:SSZ Wa(03) host-#### <vsan-health> Service exited. Exit code 1

  • The cns.vpx_storage_volume_update table has volume_id that begins with 'file:'

    • Connect to VCDB:
      > /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

    • Check for volume_id that begins with 'file:':

      > SELECT * FROM cns.vpx_storage_volume_update WHERE volume_id LIKE 'file:%';

             volume_id              |             datastore                         | vclock | modified | deleted | corrupted
      -------------------------------------------+------------------------------------------------------------+--------+----------+---------+-----------
       file:######-####-####-#### | ds:///vmfs/volumes/vsan:######-####-####-####/ | 16##54 | f        | t       | f
       file:######-####-####-#### | ds:///vmfs/volumes/vsan:######-####-####-####/ | 16##54 | f        | t       | f
       file:######-####-####-#### | ds:///vmfs/volumes/vsan:######-####-####-####/ | 16##54 | f        | t       | f
       file:######-####-####-#### | ds:///vmfs/volumes/vsan:######-####-####-####/ | 16##54 | f        | t       | f
      (4 rows)

Environment

vCenter Server 8.0U3

Cause

  • At boot up time, cns tries to set in-memory cache based on VCDB contents.
  • It runs into a null pointer when trying to set some block volume specific data structure, but is getting file: volume instead. This table, "cns.vpx_storage_volume_update", is never meant to hold file volumes.
  • These entries will not cause problem until next vsan-health reboot, hence possibly during the upgrade. Causing upgrade failure.

Resolution

This issue is fixed in vCenter server 8.0U3e.

Workaround:

NOTE: Ensure to take a valid vCenter backup and a snapshot before proceeding with the changes. Ensure to take offline snapshots for all the vCenter Virtual Machines in case of ELM.

  1. Dump the CNS tables:
    > /opt/vmware/vpostgres/current/bin/pg_dump -U postgres -d VCDB -n cns -Fc > cnstables.dmp

  2. Connect to VCDB:
    > /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

  3. Query the table to check if any vol ID with 'file:' exists:
    > SELECT * FROM cns.vpx_storage_volume_update WHERE volume_id LIKE 'file:%';

  4. Delete all rows with 'file:' in volume_id field  of cns.vpx_storage_volume_update table:
    > DELETE FROM cns.vpx_storage_volume_update WHERE volume_id LIKE 'file:%';

  5. Validate whether rows with 'file:' in volume_id field are deleted from cns.vpx_storage_volume_update table using Step-3.

  6. Upgrade the vCenter server.

Note: Upgrading to vCenter server 8.0U3e will not prevent this issue from occurring. The above workaround needs to be applied for upgrades from a lower version to vCenter server 8.0U3e. Future upgrades from vCenter server 8.0U3e will prevent this issue from occurring.