Symptoms:
Failure symptom in update_microservice.log for confirmation of scenario:
[YYYY-MM-DDTHH:MM:SS] - 116521 - snapshot:: check_status: 274 - DEBUG - workflow execution is not finished; current status: {'workflowName': 'cap-lvm-snapshot-cleanup', 'instanceId': '########-####-####-####-########bb88', 'task': 'reclaim-vfree', 'status': 'Running', 'message': 'Reclaimed snapshot /dev/sdq12', 'progress': '50%'}
[YYYY-MM-DDTHH:MM:SS] - 116521 - snapshot:: cleanup: 191 - ERROR - Error occurred while performing snapshot cleanup; error: {workflow is not finished}
[YYYY-MM-DDTHH:MM:SS] - 116521 - update_b2b_target:: _cleanup_snapshot:2621 - ERROR - Failed to cleanup vcenter snapshot; err: workflow is not finished
[YYYY-MM-DDTHH:MM:SS] - 116521 - task_manager_target:: update: 92 - DEBUG - UpdateTask: status=FAILED, progress=80, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Failed to perform cleanup']}, failure_state=None
Cap Engine logs shows the below entries which confirms the cleanup was completed but took longer than the timeout.
Note: There is an inconsistency in the logging format. Update Micro Service appears to be using UTC and Cap engine is in local. (Log file path :- /var/log/vmware/capengine/cap-lvm-snapshot-cleanup/workflow.log)
[YYYY-MM-DDTHH:MM:SS] progress.go:11: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] task_progress.go:24: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:221: Task reclaim-vfree completed
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:183: All tasks finished for workflow
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:354: Updating instance status to Completed
LVM is taking longer to clean snapshots, and the Patching workflow is timing out (currently at 4 mins). However, Patching is completed successfully and this is a post patching operation
Workaround:
To workaround and resolve the issue, please follow any one of the options mentioned below :
Option 1: To execute the steps automatically with a script follow the instructions below
Download the updateStateRemover.sh script attached in this KB on vCenter Server after the patch failed with the error "Failed to perform Cleanup"·
Login to the vCSA using an SSH Client (using Putty.exe or any similar SSH Client) using root credentials.
chmod +x updateStateRemover.sh
Run the script ./updateStateRemover.sh
Option 2: To execute the steps manually, please follow the below mentioned steps:
As VMDIRD is still in Standalone Mode Patching is not completed successfully, please follow the steps mentioned below to set the Status of VMDIRD to NORMAL mode.