vCenter 8.0 U2 patching fails with the error "Failed to perform Cleanup"
search cancel

vCenter 8.0 U2 patching fails with the error "Failed to perform Cleanup"

book

Article ID: 313290

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

  • In /var/log/vmware/applmgmt/update_microservice.log ,entries similar to the following can be found:

YYYY-MM-DD HH:MM:SS,995 - 97765 - update_functions_target::                      __init__: 770 -    DEBUG - Running python /storage/seat/software-update########/stage/update/snapshot.py --lvmOp cleanup --stageDir /storage/seat/software-update########/stage
YYYY-MM-DD HH:MM:SS,577 - 97765 - update_functions_target::      runCommandAndCheckResult: 428 -    DEBUG - runCommandAndCheckResult failed: '1+0 records in\n1+0 records out\n16 bytes copied, #.#### s, ###kB/s\nTraceback (most recent call last):\n  File "/storage/seat/software-update########/stage/update/snapshot.py", line 389, in <module>\n    main()\n 
YYYY-MM-DD HH:MM:SS,577 - 97765 -    update_b2b_target::             _cleanup_snapshot:2677 -    ERROR - Failed to cleanup vcenter snapshot; err: Failed to cleanup snapshot
YYYY-MM-DD HH:MM:SS,578 - 97765 -  task_manager_target::                        update:  91 -    DEBUG - UpdateTask: status=FAILED, progress=80, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Failed to perform cleanup']}, failure_state=None
YYYY-MM-DD HH:MM:SS,624 - 97765 -  update_microservice::                 waitForEvents: 517 -     INFO - Exiting by timeout
YYYY-MM-DD HH:MM:SS,624 - 97765 -  update_microservice::                _deletePidFile: 341 -    DEBUG - Removing pid file: /var/run/vmware/applmgmt/update_microservice.pid
YYYY-MM-DD HH:MM:SS,683 - 97765 -  update_microservice::                       __del__: 403 -    DEBUG - Closing socket...
YYYY-MM-DD HH:MM:SS,685 - 97765 -  update_microservice::                       __del__: 405 -    DEBUG - Removing sockfile
  • The Cap Engine logs at /var/log/vmware/capengine/cap-lvm-snapshot-cleanup/workflow.log shows the following entries, confirming that the cleanup was completed, although it took longer than the timeout period.

[YYYY-MM-DDTHH:MM:SS] progress.go:11: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] task_progress.go:24: Reclaim task complete.
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:221: Task reclaim-vfree completed
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:183: All tasks finished for workflow
[YYYY-MM-DDTHH:MM:SS] workflow_manager.go:354: Updating instance status to Completed

Environment

VMware vCenter Server 8.0.2

Cause

LVM is taking longer than expected to clean up snapshots, resulting in the patching workflow timing out. However, the patching process has been successfully completed, and this issue is occurring during the post-patching phase.

Resolution

Workaround:

To work around and resolve the issue, please choose one of the following options:

Option 1: To run the steps automatically using a script, follow the instructions below:

  • Download the updateStateRemover.sh script attached in this KB to the vCenter Server (path:/tmp), after the patch fails with the error "Failed to perform Cleanup"ยท

  • Log in to the vCSA VM via SSH using the root credentials.

  • The script is executable by running the following command in the same directory the script is loaded:
    • chmod +x updateStateRemover.sh

  • Run the script using the command: ./updatestateremover.sh

  • Reboot the vCenter using the below command:
    • reboot 

Option 2: To perform the steps manually, please follow the instructions outlined below:

If the VMDIRD status is not set to NORMAL mode, please follow the steps below to configure it.

  • Login to the Replication Partner vCenter using SSH Client (Embedded vCenter Server).

  • Switch to Shell by running the command below::
    • shell

  • Check the current status of VMDIR by running the command below (It will be prompted for SSO Admin credentials):
    • /usr/lib/vmware-vmafd/bin/dir-cli state get
    • Example:
      /usr/lib/vmware-vmafd/bin/dir-cli state get
      Enter password for [email protected]:
      Directory Server State: Standalone (8)

  • If the state is "Standalone," run the command below to set VMDIR to NORMAL state. If it is already in NORMAL state, skip this step and proceed to the final step to refresh the Likewise Service Manager.
    • /usr/lib/vmware-vmafd/bin/dir-cli state set --state NORMAL
      Enter password for [email protected]:
      Directory Server State set to: NORMAL (3)

  • Execute the command below to verify the status and confirm the change.
    • /usr/lib/vmware-vmafd/bin/dir-cli state get
      Enter password for [email protected]:
      Directory Server State: Normal (3)

  • Refresh Likewise Service Manager by executing below command,
    • /opt/likewise/bin/lwsm refresh

  • Execute the commands below to clean up the stale patching status and reboot the vCenter.
    • rm -rf /storage/core/software-update/stage/
    • rm /storage/db/patching.db
    • rm /etc/applmgmt/appliance/software_update_state.conf
    • rm -rf /storage/seat/software-update########
      • Note: The directory name '/software-update########' in the above command is dynamically generated (see: /var/log/vmware/applmgmt/update_microservice.log for reference)

  • Reboot the vCenter by running the below command. 
    • reboot 

Attachments

updatestateremover.sh get_app