This KB provides instructions to work around update failures caused due to
SDDC Manager update to VCF 4.5.0.0 fails in "VMware Cloud Foundation Service and Platform Upgrades" step. Below error is reported in SDDC Manager UI:
or
Check /var/log/vmware/capengine/cap-update/workflow.log indicate "Task validate failed" due to unexpected free space in volume group.
(OR)
Check the below two log files
for errors in reclaiming snapshot disks (example error messages below)
Task Failed Error
2022 / 10 / 31 09 : 19 : 49.463490 validate.go: 99 : Debug: vgname:[data_vg] actualVFreeSize: [ 24996 ] vFreeSize:[ 26214 ] toleranceAllowed:[ 3932 ] 2022 / 10 / 31 09 : 19 : 49.527247 validate.go: 99 : Debug: vgname:[lcmmount_vg] actualVFreeSize: [ 124568 ] vFreeSize:[ 104857 ] toleranceAllowed:[ 15728 ] 2022 / 10 / 31 09 : 19 : 49.527298 progress.go: 11 : Validate failed. VFree size of the volume group lcmmount_vg mismtaches the expectation. Actual: [ 124568 ] Expected: [ 104857 ]. 2022 / 10 / 31 09 : 19 : 49.527490 task_progress.go: 24 : Validate failed. VFree size of the volume group lcmmount_vg mismtaches the expectation. Actual: [ 124568 ] Expected: [ 104857 ]. 2022 / 10 / 31 09 : 19 : 49.556785 workflow_manager.go: 198 : Task validate failed. Error: Validate failed. VFree size of the volume group lcmmount_vg mismtaches the expectation. Actual: [ 124568 ] Expected: [ 104857 ]. 2022 / 10 / 31 09 : 19 : 49.556950 workflow_manager.go: 138 : Stopping workflow execution as task validate failed |
2022 / 11 / 03 21 : 12 : 26.914537 reclaimvfree.go: 242 : Executing command: vgreduce data_vg /dev/sdg1 2022 / 11 / 03 21 : 12 : 27.014444 reclaimvfree.go: 253 : Executing command: pvremove -y -ff /dev/sdg1 2022 / 11 / 03 21 : 12 : 27.126447 reclaimvfree.go: 264 : Executing command: parted -s -a opt /dev/sdg rm 1 2022 / 11 / 03 21 : 12 : 27.167333 progress.go: 11 : Reclaimed snapshot /dev/sdg1 2022 / 11 / 03 21 : 12 : 27.167401 reclaimvfree.go: 242 : Executing command: vgreduce lcmmount_vg /dev/sdg2 2022 / 11 / 03 21 : 12 : 27.167730 task_progress.go: 24 : Reclaimed snapshot /dev/sdg1 2022 / 11 / 03 21 : 12 : 27.286985 reclaimvfree.go: 253 : Executing command: pvremove -y -ff /dev/sdg2 2022 / 11 / 03 21 : 12 : 27.374610 reclaimvfree.go: 264 : Executing command: parted -s -a opt /dev/sdg rm 2 2022 / 11 / 03 21 : 12 : 27.400884 progress.go: 11 : Reclaimed snapshot /dev/sdg2 2022 / 11 / 03 21 : 12 : 27.401049 reclaimvfree.go: 242 : Executing command: vgreduce lcmmount_vg /dev/sdg2 2022 / 11 / 03 21 : 12 : 27.401154 task_progress.go: 24 : Reclaimed snapshot /dev/sdg2 2022 / 11 / 03 21 : 12 : 27.478621 progress.go: 11 : Failed to reclaim snapshot disk /dev/sdg2 from VG lcmmount_vg. Error : exit status 5 2022 / 11 / 03 21 : 12 : 27.478859 task_progress.go: 24 : Failed to reclaim snapshot disk /dev/sdg2 from VG lcmmount_vg. Error : exit status 5 2022 / 11 / 03 21 : 12 : 27.491478 workflow_manager.go: 198 : Task reclaim-vfree failed. Error: Failed to reclaim snapshot disk /dev/sdg2 from VG lcmmount_vg. Error : exit status 5 2022 / 11 / 03 21 : 12 : 27.491630 workflow_manager.go: 138 : Stopping workflow execution as task reclaim-vfree failed |
2022 / 11 / 03 20 : 40 : 06.100186 reclaimvfree.go: 242 : Executing command: vgreduce data_vg /dev/sdg1 2022 / 11 / 03 20 : 40 : 06.292377 reclaimvfree.go: 253 : Executing command: pvremove -y -ff /dev/sdg1 2022 / 11 / 03 20 : 40 : 06.444020 reclaimvfree.go: 264 : Executing command: parted -s -a opt /dev/sdg rm 1 2022 / 11 / 03 20 : 40 : 06.538938 progress.go: 11 : Reclaimed snapshot /dev/sdg1 2022 / 11 / 03 20 : 40 : 06.539027 reclaimvfree.go: 242 : Executing command: vgreduce lcmmount_vg /dev/sde /dev/sdg2 2022 / 11 / 03 20 : 40 : 06.539239 task_progress.go: 24 : Reclaimed snapshot /dev/sdg1 2022 / 11 / 03 20 : 40 : 06.772812 progress.go: 11 : Failed to reclaim snapshot disk /dev/sde /dev/sdg2 from VG lcmmount_vg. Error : exit status 126 2022 / 11 / 03 20 : 40 : 06.773629 task_progress.go: 24 : Failed to reclaim snapshot disk /dev/sde /dev/sdg2 from VG lcmmount_vg. Error : exit status 126 2022 / 11 / 03 20 : 40 : 06.819900 workflow_manager.go: 198 : Task reclaim-vfree failed. Error: Failed to reclaim snapshot disk /dev/sde /dev/sdg2 from VG lcmmount_vg. Error : exit status 126 2022 / 11 / 03 20 : 40 : 06.819970 workflow_manager.go: 138 : Stopping workflow execution as task reclaim-vfree failed |
2022 / 11 / 07 09 : 35 : 18.875054 reclaimvfree.go: 242 : Executing command: vgreduce lcmmount_vg /dev/sdc /dev/sdg2 2022 / 11 / 07 09 : 35 : 18.875229 task_progress.go: 24 : Reclaimed snapshot /dev/sdg2 2022 / 11 / 07 09 : 35 : 18.941316 progress.go: 11 : Failed to reclaim snapshot disk /dev/sdc /dev/sdg2 from VG lcmmount_vg. Error : exit status 127 2022 / 11 / 07 09 : 35 : 18.941490 task_progress.go: 24 : Failed to reclaim snapshot disk /dev/sdc /dev/sdg2 from VG lcmmount_vg. Error : exit status 127 2022 / 11 / 07 09 : 35 : 18.959857 workflow_manager.go: 198 : Task reclaim-vfree failed. Error: Failed to reclaim snapshot disk /dev/sdc /dev/sdg2 from VG lcmmount_vg. Error : exit status 127 2022 / 11 / 07 09 : 35 : 18.959911 workflow_manager.go: 138 : Stopping workflow execution as task reclaim-vfree failed |
The presence of multiple PVs in a volume group causes this failure. To assert this,
Currently there is no resolution. We are working on this
Pre-requisite:
Procedure:
Assign execute permission to the script using the following command
cd /home/vcf chmod +x update_failure_workaround.sh |
Run the below command to identify the Snapshot Device Name
grep "Configured" /var/log/vmware/capengine/cap-required-hardware-addition/workflow.log | grep "/storage/lvm_snapshot" example output: Configured disk "/dev/sdg" in the appliance and mounted on /storage/lvm_snapshot |
Perform the cleanup using the following command
./update_failure_workaround.sh <Snapshot Device>
example usage:
./update_failure_workaround.sh /dev/sdg |
example output: please check for the "Success" at the end.
INFO Remove Snapshots if present . . . . INFO Mount all filesystems mentioned in fstab INFO lvm_snapshot is mounted successfully INFO Cleanup Done . INFO altered cap update workflows INFO Success |
Once the update finishes, remove the workaround script by running the below command
rm /home/vcf/update_failure_workaround.sh |