vCenter 8.0 U2 patching fails with the error "Failed to perform Cleanup"

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

Failure symptom in update_microservice.log for confirmation of scenario:

[YYYY-MM-DDTHH:MM:SS] - 116521 - snapshot:: check_status: 274 - DEBUG - workflow execution is not finished; current status: {'workflowName': 'cap-lvm-snapshot-cleanup', 'instanceId': '########-####-####-####-########bb88', 'task': 'reclaim-vfree', 'status': 'Running', 'message': 'Reclaimed snapshot /dev/sdq12', 'progress': '50%'}
[YYYY-MM-DDTHH:MM:SS] - 116521 - snapshot:: cleanup: 191 - ERROR - Error occurred while performing snapshot cleanup; error: {workflow is not finished}
[YYYY-MM-DDTHH:MM:SS] - 116521 - update_b2b_target:: _cleanup_snapshot:2621 - ERROR - Failed to cleanup vcenter snapshot; err: workflow is not finished
[YYYY-MM-DDTHH:MM:SS] - 116521 - task_manager_target:: update: 92 - DEBUG - UpdateTask: status=FAILED, progress=80, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Failed to perform cleanup']}, failure_state=None

Cap Engine logs shows the below entries which confirms the cleanup was completed but took longer than the timeout.
Note: There is an inconsistency in the logging format. Update Micro Service appears to be using UTC and Cap engine is in local. (Log file path :- /var/log/vmware/capengine/cap-lvm-snapshot-cleanup/workflow.log)

[YYYY-MM-DDTHH:MM:SS] progress.go:11: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] task_progress.go:24: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:221: Task reclaim-vfree completed
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:183: All tasks finished for workflow
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:354: Updating instance status to Completed

Environment

VMware vCenter Server 8.0.2

Cause

LVM is taking longer to clean snapshots, and the Patching workflow is timing out (currently at 4 mins). However, Patching is completed successfully and this is a post patching operation

Resolution

The issue is resolved in the release vCenter Server 8.0 Update 2a Build 22617221
Release Notes

Workaround:

To workaround and resolve the issue, please follow any one of the options mentioned below :

Option 1: To execute the steps automatically with a script follow the instructions below

Download the updateStateRemover.sh script attached in this KB on vCenter Server after the patch failed with the error "Failed to perform Cleanup"·
Login to the vCSA using an SSH Client (using Putty.exe or any similar SSH Client) using root credentials.
chmod +x updateStateRemover.sh
Run the script ./updateStateRemover.sh

Option 2: To execute the steps manually, please follow the below mentioned steps:

As VMDIRD is still in Standalone Mode Patching is not completed successfully, please follow the steps mentioned below to set the Status of VMDIRD to NORMAL mode.

Login to the Replication Partner vCenter using SSH Client (Embedded vCenter Server).
Change Shell to Bash running the command below,

shell

Verify the current status of VMDIR (it will prompt for SSO Admin Credentials) running the command below,

/usr/lib/vmware-vmafd/bin/dir-cli state get

Example:

/usr/lib/vmware-vmafd/bin/dir-cli state get

Enter password for administrator@vsphere.local:

Directory Server State: Standalone (8)

If the State is Standalone, execute below command to Set VMDIR in NORMAL State (If it is already in NORMAL state, skip this step and proceed with the final step to Refresh Likewise Service Manager),

/usr/lib/vmware-vmafd/bin/dir-cli state set --state NORMAL

Enter password for administrator@vsphere.local:

Directory Server State set to: NORMAL (3)

Run the below command to verify the status to confirm the change,

/usr/lib/vmware-vmafd/bin/dir-cli state get

Enter password for administrator@vsphere.local:

Directory Server State: Normal (3)

Refresh Likewise Service Manager by executing below command,

/opt/likewise/bin/lwsm refresh

rm -rf /storage/<subdir>/software-update/stage/
rm /storage/db/patching.db
rm /etc/applmgmt/appliance/software_update_state.conf
rm -rf /storage/<subdir>/software-updatemh63juvn -

Note: The post-fix of directory name '/software-updateXXXX' in the above command is dynamically generated.

Reboot vCenter.

Attachments

updateStateRemover get_app