SDDC Manager upgrade fails at Third_Party "Setup Common Appliance Platform" stage with "Hardware Addition workflow failed"
search cancel

SDDC Manager upgrade fails at Third_Party "Setup Common Appliance Platform" stage with "Hardware Addition workflow failed"

book

Article ID: 392399

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation

Issue/Introduction

  • SDDC Manager upgrade fails at Third_Party "Setup Common Appliance Platform" stage with "Hardware Addition workflow failed"
  • Running the "lsblk" command shows that the lvm_snapshot mount is NOT created
  • The below screenshot shows a normal environment with lvm_snapshot mounted, as a point of reference

     

  • Error in SDDC Manager Third party logs (/var/log/vmware/vcf/lcm/thirdparty/upgrades/<upgrade_id>/vcf-platform/cap-platform-setup/cap_platform_setup.log) show similar output to the below log snippet 
    INFO: http://127.0.0.1:15051/capengine/api/v1/workflows is up
    INFO: /var/log/vmware/capengine/vcf-cap-workflows.json
    INFO: URL: http://127.0.0.1:15051/capengine/api/v1/workflow/instance/7c66####-####-####-########3361
    INFO: Info: {'workflowName': 'cap-required-hardware-addition', 'instanceId': '7c66####-####-####-########3361', 'task': 'add-disk', 'status': 'Failed', 'message': 'failed to mount file system. Output: . ERROR: exit status 64', 'progress': '0%'}
    INFO: URL: http://localhost/lcm/about
    INFO: Info: {'name': 'LCM', 'serviceId': 'b8e5####-####-####-########ca5c', 'author': 'VMware', 'builtBy': 'mts', 'createdBy': 'Apache Maven 3.5.0', 'buildJdk': '1.8.0-jdk8u322-ga', 'version': '4.4.1-vcf4411RELEASE-21506568', 'buildDate': '2023-03-27 08:32:22 UTC'}
    ERROR: Hardware Addition workflow failed
    INFO: Updated /var/log/vmware/vcf/lcm/thirdparty/upgrades/2b04####-####-####-########cb05/vcf-platform/cap-platform-setup/cap_platform_setup.status status file with data OrderedDict([('upgradeId', '2b04####-####-####-########cb05'), ('resourceId', 'afdd####-####-####-########c4be'), ('upgradeStatusCode', 'COMPLETED_WITH_FAILURE'), ('progress', 0), ('error', {'errorCode': None, 'errorDescription': 'Hardware Addition workflow failed'}), ('endTime', 1743093871)])
    INFO: Execute cmd: lsblk -o NAME,TYPE,MOUNTPOINT -n -i -r | grep alt_root
    INFO: vg_alt_root-lv_alt_root lvm /storage/alt_root
    
    INFO: RC: 0, OUT: vg_alt_root-lv_alt_root lvm /storage/alt_root
    
    INFO: Execute cmd: lsblk -o NAME,TYPE,MOUNTPOINT -n -i -r | grep lvm_snapshot
    INFO:
    INFO: RC: 1, OUT:
    INFO: URL: http://localhost/lcm/about
    INFO: Info: {'name': 'LCM', 'serviceId': 'b8e5####-####-####-########ca5c', 'author': 'VMware', 'builtBy': 'mts', 'createdBy': 'Apache Maven 3.5.0', 'buildJdk': '1.8.0-jdk8u322-ga', 'version': '4.4.1-vcf4411RELEASE-21506568', 'buildDate': '2023-03-27 08:32:22 UTC'}
    INFO: Updated /var/log/vmware/vcf/lcm/thirdparty/upgrades/2b04####-####-####-########cb05/vcf-platform/cap-platform-setup/cap_platform_setup.status status file with data OrderedDict([('upgradeId', '2b04####-####-####-########cb05'), ('resourceId', 'afdd####-####-####-########c4be'), ('upgradeStatusCode', 'COMPLETED_WITH_FAILURE'), ('progress', 0), ('error', OrderedDict([('errorCode', 1), ('errorDescription', 'Hardware Addition workflow failed')])), ('endTime', 1743093871)])
    
    ERROR:
    Traceback (most recent call last):
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/cap_platform_setup.py.copy", line 271, in <module>
        migration_workflow()
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/cap_platform_setup.py.copy", line 186, in migration_workflow
        wrapper.execute_cmd_locally("lsblk -o NAME,TYPE,MOUNTPOINT -n -i -r | grep lvm_snapshot")
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/../../wrapper.py", line 192, in execute_cmd_locally
        self.update_status(return_code=rc, status='COMPLETED_WITH_FAILURE', errmsg=err)
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/../../wrapper.py", line 168, in update_status
        raise Exception
    Exception
  • Logs will also show "CAP services are not enabled in SDDC Manager" in the cap_platform_setup.log. This error message is misleading because in this instance the cap-workflow-engine.service is running throughout the whole process. Executing KB 376799 will not resolve the issue as the service is already running.
    INFO: URL: http://localhost/lcm/about
    INFO: Info: {'name': 'LCM', 'serviceId': 'b8e5####-####-####-########ca5c', 'author': 'VMware', 'builtBy': 'mts', 'createdBy': 'Apache Maven 3.5.0', 'buildJdk': '1.8.0-jdk8u322-ga', 'version': '4.4.1-vcf4411RELEASE-21506568', 'buildDate': '2023-03-27 08:32:22 UTC'}
    INFO: Updated /var/log/vmware/vcf/lcm/thirdparty/upgrades/2b04####-####-####-########cb05/vcf-platform/cap-platform-setup/cap_platform_setup.status status file with data OrderedDict([('upgradeId', '2b04####-####-####-########cb05'), ('resourceId', 'afdd####-####-####-########c4be'), ('upgradeStatusCode', 'COMPLETED_WITH_FAILURE'), ('progress', 0), ('error', OrderedDict([('errorCode', 1), ('errorDescription', 'Hardware Addition workflow failed')])), ('endTime', 1743093871)])
    ERROR: CAP services are not enabled in SDDC Manager
    INFO:
    INFO: RC: 1, OUT:
    INFO: ERR: Traceback (most recent call last):
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/cap_platform_setup.py.copy", line 271, in <module>
        migration_workflow()
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/cap_platform_setup.py.copy", line 186, in migration_workflow
        wrapper.execute_cmd_locally("lsblk -o NAME,TYPE,MOUNTPOINT -n -i -r | grep lvm_snapshot")
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/../../wrapper.py", line 192, in execute_cmd_locally
        self.update_status(return_code=rc, status='COMPLETED_WITH_FAILURE', errmsg=err)
      File "/var/log/vmware/vcf/lcm/thirdparty/bundles/363bd141-7d19-4287-9c7a-091c11042ca0/thirdparty/cap-platform-setup/bin/../../wrapper.py", line 168, in update_status
        raise Exception
    Exception

Cause

  • Filesystem remount fails after adding entry for alt_root disk. Due to the failure, the hardware-addition workflow exits and lvm_snapshot disk is not added to the VM.
  • During next retries of the upgrade from SDDC UI, LCM does not (re)execute either hardware-addition workflow or cap-migrateroot workflow, and keeps failing as lvm_snapshot mount point is not present.

Resolution

  1. Open an SSH session to the SDDC Manager with vcf user and su to root.
  2. Remove/Rename vcf-cap-worklfows.json file (if present)
    mv /var/log/vmware/capengine/vcf-cap-workflows.json /var/log/vmware/capengine/vcf-cap-workflows.json.bkp
  3. Register a user credential with the following shell commands (Replace <password> with your choice of password)
    echo -n <password> | /usr/lib/vmware-capengine/scripts/user_registry.sh add -u apiuser
  4. Create the basic authentication token (Replace <password> with your previous choice of password)
    authHeader="Basic $(echo -n "apiuser:<password>" | base64)"
  5. Start the CAP Hardware Addition Workflow (Replace: <vc_ip_address>, <vm_short_name>, and <[email protected]_password>)
    curl --location --request POST 'http://127.0.0.1:15051/capengine/api/v1/workflow/execute' \
    --header 'Content-Type: application/json' \
    --header "Authorization: $authHeader" \
    --data-raw '{
    "workflowName": "cap-required-hardware-addition",
    "operation": "start",
    "parameters" : {
    "VC-Host": "<vc_ip_address>",
    "VM-Name": "<vm_short_name>",
    "TLS-Verify-Cert": "NO",
    "User": "[email protected]",
    "Password": "<[email protected]_password>"
    }
    }' | jq
  6. Wait for the cap-required-hardware-addition workflow to finish and validate the disk LVM_snapshot is created with the "lsblk" command output.
  7. Start Root migration to LVM workflow
    curl --location --request POST 'http://127.0.0.1:15051/capengine/api/v1/workflow/execute' \
    --header 'Content-Type: application/json' \
    --header "Authorization: $authHeader" \
    --data-raw '{
    "workflowName": "cap-migrateroot",
    "operation": "start",
    "parameters" : {
    "Root-VG": "vg_system",
    "Root-LV": "lv_root"
    }
    }' | jq
  8. Wait for cap-migrateroot workflow to finish, by monitoring the logs.
  9. Reboot the VM (only after cap-migrateroot workflow is done).
  10. Retry upgrade from UI.