Patching 8.0u3 24022515 to 8.0u3e fails with: "Exception occurred in postInstallHook"

Products

VMware vCenter Server VMware vCenter Server 8.0

Issue/Introduction

While patching vCenter Server from version 8.0u3 build 24022515 to 8.0u3e, the patch process fails at 80% with the following error:

Exception occurred in postInstallHook for B2B-pathing. Please check the logs for more details. Take corrective action and then resume

Log analysis from /var/log/vmware/applmgmt/Patchrunner.log shows:

YYYY-MM-DDThh:mm:ssZ  ERROR vmware_b2b.patching.phases.patcher Patch hook Patch got unhandled exception.
Traceback (most recent call last):
  File "/storage/seat/software-update_5fygr8x/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 208, in patch
    _patchComponents(ctx, userData, statusAggregator.reportingQueue)
  File "/storage/seat/software-update_5fygr8x/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 89, in _patchComponents
    _startDependentServices(c)
  File "/storage/seat/software-update_5fygr8x/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 56, in _startDependentServices
    serviceManager.start(depService)
  File "/storage/seat/software-update_5fygr8x/stage/scripts/patches/libs/sdk/service_manager.py", line 909, in wrapper
    return getattr(controller, attr)(*args, **kwargs)
  File "/storage/seat/software-update_5fygr8x/stage/scripts/patches/libs/sdk/service_manager.py", line 799, in start
    super(VMwareServiceController, self).start(serviceName)
  File "/storage/seat/software-update_5fygr8x/stage/scripts/patches/libs/sdk/service_manager.py", line 665, in start
    raise IllegalServiceOperation(errorText)
service_manager.IllegalServiceOperation: Service cannot be started. Error: Error executing start on service vsan-health. Details {
    "detail": [
        {
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "args": [
                "vsan-health"
            ],
            "localized": "An error occurred while starting service 'vsan-health'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}
Service-control failed. Error: {
    "detail": [
        {
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "args": [
                "vsan-health"
            ],
            "localized": "An error occurred while starting service 'vsan-health'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}

YYYY-MM-DDThh:mm:ssZ  WARNING root stopping status aggregation...
YYYY-MM-DDThh:mm:ssZ  ERROR __main__ Patch vCSA failed

Corresponding /var/log/vmware/vsan-health/vmware-vsan-health-service.log also shows:

Traceback (most recent call last):
  File "bora/vsan/common/VsanScheduler.py", line 111, in Run
  File "bora/vsan/clustermgmt/vpxd/VsanClusterPrototypeImpl.py", line 5017, in ReconcileDatastoreName
  File "bora/vsan/vsanvp/vpxd/pyMoVsan/VsanVpUtil.py", line 24, in GetClusterFromContainerId
ImportError: cannot import name 'GetClusterMoId' from '_VsanMgmtServer' (unknown location)
YYYY-MM-DDThh:mm:ssZ INFO vsan-mgmt[260913] [VsanScheduler::_ThreadMain opID=vsan-6######8bde-W8] Job done
YYYY-MM-DDThh:mm:ssZ INFO vsan-mgmt[261002] [VsanScheduler::ScheduleWorkItem opID=vsan-PC-63933f73c8bde] Work entities length: 5
YYYY-MM-DDThh:mm:ssZ INFO vsan-mgmt[260712] [VsanVcModuleImporter::startImport opID=noOpId] Importing VSAN extension VsanVcStretchedCluster__ext_init__
YYYY-MM-DDThh:mm:ssZ INFO vsan-mgmt[260916] [VsanScheduler::_ThreadMain opID=vsan-PC-6######8bde-W8] Executing itemListHead: datastore-###-ReconcileDatastoreName: func: ReconcileDatastoreName, {'conn': <VsanManagementVcConnection.VsanManagementVcConnection object at 0x7fa73e1f2110>, 'db': <VsanClusterPrototypeImpl.PersistenceHelper object at #######0>, 'datastore': 'vim.Datastore:datastore-###'}, {}
YYYY-MM-DDThh:mm:ssZ ERROR vsan-mgmt[260914] [VsanScheduler::_ThreadMain opID=vsan-PC-63933f73c8bde-W6] Workitem 6 failed
Traceback (most recent call last):
  File "bora/vsan/common/VsanScheduler.py", line 357, in _ThreadMain
  File "bora/vsan/common/VsanScheduler.py", line 111, in Run
  File "bora/vsan/clustermgmt/vpxd/VsanClusterPrototypeImpl.py", line 5017, in ReconcileDatastoreName
  File "bora/vsan/vsanvp/vpxd/pyMoVsan/VsanVpUtil.py", line 24, in GetClusterFromContainerId
ImportError: cannot import name 'GetClusterMoId' from '_VsanMgmtServer' (unknown location)

Environment

vCenter server 8.X

Cause

The issue is triggered due to the presence of invalid entries in the cns.vpx_storage_volume_update table within the VCDB. These entries include volume_id values prefixed with file:, which are not expected in this table.

During boot-up, CNS attempts to populate its in-memory cache using database contents. However, it encounters a null pointer error because it expects block volume-specific data but receives file volume information instead.

The table "cns.vpx_storage_volume_update" is not intended to store file volumes. While these entries don't cause immediate issues, problems may arise during the next vsan-health reboot and potentially during an upgrade

Example of problematic entries:

select * from cns.vpx_storage_volume_update;

                 volume_id                 |                         datastore                          | vclock | modified | deleted | corrupted 
-------------------------------------------+------------------------------------------------------------+--------+----------+---------+-----------
 file:79de7b37-####-4ae7-####-69a####0938f | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | f        | t       | f
 file:0f66d117-####-4061-####-5fab##30bd1# | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | f        | t       | f
 file:9480f2da-####-48a2-####-e8df22c97873 | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | f        | t       | f
 013bdbba-####-424c-####-5b####573c29      | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | t        | f       | t
 2306eea1-####-4bbe-####-adb2####3f36      | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | t        | f       | t
 6228ee2a-####-4151-####-e9f8d###60ae      | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | t        | f       | t
 b4e48852-####-46e8-####-dcf08e0fea14      | ds:///vmfs/volumes/vsan:52c#####b3b4#3-c85####4bbd###/ |    723 | t        | f       | t
 file:0a8afb23-####-499a-####-69a####0938f | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f
 file:40e20911-####-40b9-####-c1#####72e13 | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f
 file:1258fb5e-####-4c57-####-72b####70208 | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f
 file:3334cc86-####-4a1d-####-e01#####f807 | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f
 file:b6a6f4ac-####-4504-####-7bdf####0454 | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f
 file:175d751a-####-429b-####-0ad####facda | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f
 file:bf669c95-####-4c0a-####-e4####d7f0e5 | ds:///vmfs/volumes/vsan:5289####d528b##-659####fbe4####/ | 140669 | f        | t       | f

Resolution

NOTE:-Take snaphot of the vCenter server before making any changes in database. If the vCenter server are in linked mode, then power of all the vCenter server in ELM and then take snapshot.

Stop vpxd and content-library services:

service-control --stop vpxd
service-control --stop content-library
Connect to the vCenter database:

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

Review problematic entries:

select * from cns.vpx_storage_volume_update;

If the table has entries with file, the delete the entries

DELETE FROM cns.vpx_storage_volume_update WHERE volume_id LIKE 'file:%';

Start the services back:

service-control --start vpxd
service-control --start content-library
Retry the patching process.