Symptoms:
In /var/log/vmware/applmgmt/update_microservice.log
,entries similar to the following can be found:
YYYY-MM-DD HH:MM:SS,995 - 97765 - update_functions_target:: __init__: 770 - DEBUG - Running python /storage/seat/software-update########/stage/update/snapshot.py --lvmOp cleanup --stageDir /storage/seat/software-update########/stage
YYYY-MM-DD HH:MM:SS,577 - 97765 - update_functions_target:: runCommandAndCheckResult: 428 - DEBUG - runCommandAndCheckResult failed: '1+0 records in\n1+0 records out\n16 bytes copied, #.#### s, ###kB/s\nTraceback (most recent call last):\n File "/storage/seat/software-update########/stage/update/snapshot.py", line 389, in <module>\n main()\n
YYYY-MM-DD HH:MM:SS,577 - 97765 - update_b2b_target:: _cleanup_snapshot:2677 - ERROR - Failed to cleanup vcenter snapshot; err: Failed to cleanup snapshot
YYYY-MM-DD HH:MM:SS,578 - 97765 - task_manager_target:: update: 91 - DEBUG - UpdateTask: status=FAILED, progress=80, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Failed to perform cleanup']}, failure_state=None
YYYY-MM-DD HH:MM:SS,624 - 97765 - update_microservice:: waitForEvents: 517 - INFO - Exiting by timeout
YYYY-MM-DD HH:MM:SS,624 - 97765 - update_microservice:: _deletePidFile: 341 - DEBUG - Removing pid file: /var/run/vmware/applmgmt/update_microservice.pid
YYYY-MM-DD HH:MM:SS,683 - 97765 - update_microservice:: __del__: 403 - DEBUG - Closing socket...
YYYY-MM-DD HH:MM:SS,685 - 97765 - update_microservice:: __del__: 405 - DEBUG - Removing sockfile
The Cap Engine logs at /var/log/vmware/capengine/cap-lvm-snapshot-cleanup/workflow.log
shows the following entries, confirming that the cleanup was completed, although it took longer than the timeout period.
[YYYY-MM-DDTHH:MM:SS] progress.go:11: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] task_progress.go:24: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:221: Task reclaim-vfree completed
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:183: All tasks finished for workflow
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:354: Updating instance status to Completed
LVM is taking longer than expected to clean up snapshots, resulting in the patching workflow timing out. However, the patching process has been successfully completed, and this issue is occurring during the post-patching phase.
Workaround:
Option 1: To run the steps automatically using a script, follow the instructions below:
/tmp
), after the patch fails with the error "Failed to perform Cleanup"ยทchmod +x updateStateRemover.sh
Run the script using the command: ./updatestateremover.sh
reboot
Option 2: To perform the steps manually, please follow the instructions outlined below:
If the VMDIRD status is not set to NORMAL mode, please follow the steps below to configure it.
shell
/usr/lib/vmware-vmafd/bin/dir-cli state get
/usr/lib/vmware-vmafd/bin/dir-cli state get
Enter password for [email protected]:
Directory Server State: Standalone (8)
/usr/lib/vmware-vmafd/bin/dir-cli state set --state NORMAL
Enter password for [email protected]:
Directory Server State set to: NORMAL (3)
/usr/lib/vmware-vmafd/bin/dir-cli state get
Enter password for [email protected]:
Directory Server State: Normal (3)
/opt/likewise/bin/lwsm refresh
rm -rf /storage/core/software-update/stage/
rm /storage/db/patching.db
rm /etc/applmgmt/appliance/software_update_state.conf
rm -rf /storage/seat/software-update########
'/software-update########'
in the above command is dynamically generated (see: /var/log/vmware/applmgmt/update_microservice.log
for reference)reboot