vCenter 8.0 U2 patching fails with the error "Failed to perform Cleanup"
search cancel

vCenter 8.0 U2 patching fails with the error "Failed to perform Cleanup"

book

Article ID: 313290

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • Failure symptom in update_microservice.log for confirmation of scenario:

2023-09-28 14:39:12,436 - 116521 - snapshot:: check_status: 274 - DEBUG - workflow execution is not finished; current status: {'workflowName': 'cap-lvm-snapshot-cleanup', 'instanceId': 'e1aa4364-6d10-46a6-88b8-3b7f3963bb88', 'task': 'reclaim-vfree', 'status': 'Running', 'message': 'Reclaimed snapshot /dev/sdq12', 'progress': '50%'}
2023-09-28 14:39:12,436 - 116521 - snapshot:: cleanup: 191 - ERROR - Error occurred while performing snapshot cleanup; error: {workflow is not finished}
2023-09-28 14:39:12,436 - 116521 - update_b2b_target:: _cleanup_snapshot:2621 - ERROR - Failed to cleanup vcenter snapshot; err: workflow is not finished
2023-09-28 14:39:12,436 - 116521 - task_manager_target:: update: 92 - DEBUG - UpdateTask: status=FAILED, progress=80, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Failed to perform cleanup']}, failure_state=None

  • Cap Engine logs shows the below entries which confirms the cleanup was completed but took longer than the timeout.
    Note: There is an inconsistency in the logging format. Update Micro Service appears to be using UTC and Cap engine is in local. (Log file path :- /var/log/vmware/capengine/cap-lvm-snapshot-cleanup/workflow.log)

2023/09/28 16:40:52.008088 progress.go:11: Reclaim task complete.
2023/09/28 16:40:52.008300 task_progress.go:24: Reclaim task complete.
2023/09/28 16:40:52.014217 workflow_manager.go:221: Task reclaim-vfree completed
2023/09/28 16:40:52.014251 workflow_manager.go:183: All tasks finished for workflow
2023/09/28 16:40:52.014265 workflow_manager.go:354: Updating instance status to Completed

 


Environment

VMware vCenter Server 8.0.2

Cause

LVM is taking longer to clean snapshots, and the Patching workflow is timing out (currently at 4 mins). However, Patching is completed successfully and this is a post patching operation

Resolution



    Workaround:

    To workaround and resolve the issue, please follow any one of the options mentioned below :

    Option 1: To execute the steps automatically with a script follow the instructions below

    1. Download the updateStateRemover.sh script attached in this KB  on vCenter Server after the patch failed with the error "Failed to perform Cleanup

    2. Login to the vCSA using an SSH Client (using Putty.exe or any similar SSH Client) using root credentials.

    3. chmod +x updateStateRemover.sh

    4. Run the script ./updateStateRemover.sh


    Option 2: To execute the steps manually, please follow the below mentioned steps:

    As VMDIRD is still in Standalone Mode Patching is not completed successfully, please follow the steps mentioned below to set the Status of VMDIRD to NORMAL mode.

    1. Login to the Replication Partner vCenter using SSH Client (Embedded vCenter Server).
    2. Change Shell to Bash running the command below,
    shell
    1. Verify the current status of VMDIR (it will prompt for SSO Admin Credentials) running the command below,
    /usr/lib/vmware-vmafd/bin/dir-cli state get
    Example:
    /usr/lib/vmware-vmafd/bin/dir-cli state get
     
    Enter password for [email protected]:
    Directory Server State: Standalone (8)
    1. If the State is Standalone, execute below command to Set VMDIR in NORMAL State (If it is already in NORMAL state, skip this step and proceed with the final step to Refresh Likewise Service Manager),
    /usr/lib/vmware-vmafd/bin/dir-cli state set --state NORMAL
    Enter password for [email protected]:
    Directory Server State set to: NORMAL (3)
    1. Run the below command to verify the status to confirm the change,
    /usr/lib/vmware-vmafd/bin/dir-cli state get
    Enter password for [email protected]:
    Directory Server State: Normal (3)
    1. Refresh Likewise Service Manager by executing below command,
    /opt/likewise/bin/lwsm refresh
    1. rm -rf /storage/<subdir>/software-update/stage/
    2. rm /storage/db/patching.db
    3. rm /etc/applmgmt/appliance/software_update_state.conf
    4. rm -rf /storage/<subdir>/software-updatemh63juvn - ​​​​​
    Note: The post-fix of directory name '/software-updateXXXX' in the above command is dynamically generated.
    1. Reboot vCenter.


    Attachments

    updateStateRemover get_app