Failure symptom in update_microservice.log for confirmation of scenario:
2023-09-28 14:39:12,436 - 116521 - snapshot:: check_status: 274 - DEBUG - workflow execution is not finished; current status: {'workflowName': 'cap-lvm-snapshot-cleanup', 'instanceId': 'e1aa4364-6d10-46a6-88b8-3b7f3963bb88', 'task': 'reclaim-vfree', 'status': 'Running', 'message': 'Reclaimed snapshot /dev/sdq12', 'progress': '50%'}
2023-09-28 14:39:12,436 - 116521 - snapshot:: cleanup: 191 - ERROR - Error occurred while performing snapshot cleanup; error: {workflow is not finished}
2023-09-28 14:39:12,436 - 116521 - update_b2b_target:: _cleanup_snapshot:2621 - ERROR - Failed to cleanup vcenter snapshot; err: workflow is not finished
2023-09-28 14:39:12,436 - 116521 - task_manager_target:: update: 92 - DEBUG - UpdateTask: status=FAILED, progress=80, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Failed to perform cleanup']}, failure_state=None
Cap Engine logs shows the below entries which confirms the cleanup was completed but took longer than the timeout.
Note: There is an inconsistency in the logging format. Update Micro Service appears to be using UTC and Cap engine is in local. (Log file path :- /var/log/vmware/capengine/cap-lvm-snapshot-cleanup/workflow.log)
2023/09/28 16:40:52.008088 progress.go:11: Reclaim task complete.
2023/09/28 16:40:52.008300 task_progress.go:24: Reclaim task complete.
2023/09/28 16:40:52.014217 workflow_manager.go:221: Task reclaim-vfree completed
2023/09/28 16:40:52.014251 workflow_manager.go:183: All tasks finished for workflow
2023/09/28 16:40:52.014265 workflow_manager.go:354: Updating instance status to Completed
LVM is taking longer to clean snapshots, and the Patching workflow is timing out (currently at 4 mins). However, Patching is completed successfully and this is a post patching operation
To workaround and resolve the issue, please follow any one of the options mentioned below :
Option 1: To execute the steps automatically with a script follow the instructions below
Download the updateStateRemover.sh script attached in this KB on vCenter Server after the patch failed with the error "Failed to perform Cleanup"·
Login to the vCSA using an SSH Client (using Putty.exe or any similar SSH Client) using root credentials.
chmod +x updateStateRemover.sh
Run the script ./updateStateRemover.sh
Option 2: To execute the steps manually, please follow the below mentioned steps:
As VMDIRD is still in Standalone Mode Patching is not completed successfully, please follow the steps mentioned below to set the Status of VMDIRD to NORMAL mode.