Patching Fails on Management or Service vCenter in Linked Mode

Products

VMware vCenter Server

Issue/Introduction

During the patching of the vCenter, multiple retries are attempted until the process reaches the maximum allowed attempts and fails. This leaves the vCenter in a broken state, even though it appears operational after the cleanup. Subsequent patch attempts also fail because the previous update was not completed properly.

cd var/log/vmwareapplmgmt

PatchRunner.log

####-##-## ##:##:##,###.##Z wcp:Patch WARNING root User wcp is not in group cis or group cis does not exist
####-##-## ##:##:##,###.##Z wcp:Patch ERROR wcp Failed to apply patch %s! Error: %s.
####-##-## ##:##:##,###.##Z wcp:Patch ERROR wcp Not all patches were applied. Latest applied patch is 1
####-##-## ##:##:##,###.##Z wcp:Patch ERROR vmware_b2b.patching.executor.hook_executor Patch hook 'wcp:Patch' failed.
Traceback (most recent call last):
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/py/vmware_b2b/patching/executor/hook_executor.py", line 74, in executeHook
    executionResult = systemExtension(args)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/libs/sdk/extensions.py", line 106, in __call__
    result = self.extension(*args)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/libs/sdk/extensions.py", line 123, in _func
    return func(*args)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/payload/components-script/wcp/__init__.py", line 213, in doPatching
    doIncrementalPatching(current_version)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/payload/components-script/wcp/__init__.py", line 340, in doIncrementalPatching
    raise user_error
patch_errors.UserError: Failed to apply patch roles_groups_users! Error: {
    "detail": [
        {
            "id": "install.ciscommon.command.errinvoke",
            "translatable": "An error occurred while invoking external command : '%(0)s'",
            "args": [
                "Error 53 while creating SSO group \"NsxAdministrators\":\ndir-cli failed. Error 53: Possible errors: \nLDAP error: Server is unwilling to perform \nWin Error: Operation failed with error ERROR_BAD_NETPATH (53) \n"
            ],
            "localized": "An error occurred while invoking external command : 'Error 53 while creating SSO group \"NsxAdministrators\":\ndir-cli failed. Error 53: Possible errors: \nLDAP error: Server is unwilling to perform \nWin E
rror: Operation failed with error ERROR_BAD_NETPATH (53) \n'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}.
####-##-##T##:##:##.##Z ERROR vmware_b2b.patching.phases.patcher Patch hook Patch got ComponentWrapperError.
Traceback (most recent call last):
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 203, in patch
    _patchComponents(ctx, userData, statusAggregator.reportingQueue)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 85, in _patchComponents
    executeComponentHook(Hook.Patch, ctx, c, userData, reportingQueue)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/py/vmware_b2b/patching/executor/execution_facade.py", line 98, in executeComponentHook
    reportQueue, identifier, expectedResultType)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/py/vmware_b2b/patching/executor/execution_facade.py", line 53, in executeHook
    result = executor.executeHook(scriptFile, hook, args, reportQueue, reportIdentifier)
  File "/storage/seat/software-update6lsigt6l/stage/scripts/patches/py/vmware_b2b/patching/executor/hook_executor_process.py", line 119, in executeHook
    raise ex
patch_errors.ComponentError
####-##-##T##:##:##.##Z WARNING root stopping status aggregation...
####-##-##T##:##:##.##Z ERROR __main__ Patch vCSA failed
####-##-##T##:##:##.##Z INFO __main__ Start executing Discovery of vCSA patching components with following arguments 'PatchRunner.py discovery -o /tmp/tmp######### -d /storage/seat/software-update6lsigt6l/stage/patch_runner --disableStdoutLogging'
####-##-##T##:##:##.##Z INFO vmware_b2b.patching.phases.discoverer This is a Resume Flow, skipping running Discovery hook again
####-##-##T##:##:##.##Z ERROR __main__ Discovery of vCSA patching components failed
####-##-##T##:##:##.##Z INFO __main__ Start executing Discovery of vCSA patching components with following arguments 'PatchRunner.py discovery -o /tmp/tmp######## -d /storage/seat/software-update6lsigt6l/stage/patch_runner --disableStdoutLogging'
####-##-##T##:##:##.##Z INFO vmware_b2b.patching.phases.discoverer This is a Resume Flow, skipping running Discovery hook again
####-##-##T##:##:##.##Z ERROR __main__ Discovery of vCSA patching components failed
####-##-##T##:##:##.##Z INFO __main__ Start executing Discovery of vCSA patching components with following arguments 'PatchRunner.py discovery -o /tmp/tmp#########_z -d /storage/seat/software-update6lsigt6l/stage/patch_runner --disableStdoutLogging'
####-##-##T##:##:##.##Z INFO vmware_b2b.patching.phases.discoverer This is a Resume Flow, skipping running Discovery hook again
####-##-##T##:##:##.##Z ERROR __main__ Discovery of vCSA patching components failed

Environment

7.0 u3t

7.0 U3v

Cause

Multiple retries are attempted during patching, leading to the error:

"Installation failed – Install in progress – You have reached the maximum number of retries to resume patching. Please restore the vCenter from the backup."
Incorrect steps followed for the below error that results in the cleanup of the patching state on the appliance. Ideally, when the maximum resume attempts are exhausted, the vCenter should be reverted to a backup instead of cleaning up the patching state.
```
Disk /storage/archive in vCenter Server has total 105 GB and free 5 GB.
Resolution
There is not enough space on the disk(s). Increase disk space and try again
```
At this stage, the vCenter is already in a broken state because the WCP patch fails, and the subsequent cleanup makes the system think it is in a fresh state. This causes it to attempt the update again, even though the previous update did not complete properly and left the system partially broken.
Patching is attempted again after the cleanup. The vCenter appears to be in a working state due to the cleanup, but in reality, it remains broken because the underlying cause is not addressed.

Resolution

Steps to restore both vCenter's using file-based backups and patch them after the restoration:

Delete both the management and service VMs.
Restore the service vCenter first, upgrade it successfully, and confirm completion.
Restore the management vCenter and attempt the upgrade. If it fails, delete the newly restored management VM again, power off the service vCenter, restore the management vCenter, and patch it while the service vCenter is off.
Once the patch completes successfully, power on the service vCenter and confirm that both vCenter's are in sync.

Note:

KB article 406559 provides steps to prevent the restoration from failing at 80%. Make sure to follow these steps before starting the restore.
Additionally when you have offline snapshots do not remove until a valid KB article says or matches your current issue that you are facing.

Additional Information

You may face issue during patching.

The VMdir state may switch to standalone or read-only mode.

KB 390951

You may face issue during restoration.

For the KB articles listed below, the restoration fails at 80% with the errors mentioned in the article.

KB 344773 / KB 326315