"An internal error occurred while staging/remediating the host" while preparing ESXi host for NSX
search cancel

"An internal error occurred while staging/remediating the host" while preparing ESXi host for NSX

book

Article ID: 388598

calendar_today

Updated On:

Products

VMware NSX-T Data Center VMware vSphere ESXi 8.0 VMware NSX

Issue/Introduction

  • The cluster is a vLCM cluster.
  • Preparing vLCM cluster for NSX failed with "Solution apply failed on host: '<hostname>' An internal error occurred while staging/remediating the host."
  • As NSX installation for vCLM cluster is performed serially, the installation on the rest of the hosts in the cluster is skipped.

Environment

VMware NSX-T Data Center

VMware NSX

VMware vSphere 8.0

Cause

  • NSX installation failed on ESXi host because the ESXi host failed to perform config backup after installing NSX vibs.
  • ESXi host is supposed to perform backup with /sbin/backup.sh script after installing new vibs to ensure the config change persists across reboots.
  • Issue can be identified as followed.
  • lifecycle.log indicates the NSX vibs were successfully installed on ESXi host:

    2025-01-09T07:49:50Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-exporter-4.1.2.1.0-8.0.22667792
    2025-01-09T07:49:51Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-monitoring-4.1.2.1.0-8.0.22667792

    ......

    2025-01-09T07:50:19Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-ids-4.1.2.1.0-8.0.22667792
    2025-01-09T07:50:20Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-mpa-4.1.2.1.0-8.0.22667792

  • However, ESXi host failed to perform config backup after installing the vibs. As a result this caused the NSX installation to pause and fail:
    2025-01-09T07:51:54Z In(14) lifecycle[3191482]: BootBankInstaller:336 Copying bootbank /usr/lib/vmware/lifecycle/stagebootbank to /altbootbank
    2025-01-09T07:51:57Z In(14) lifecycle[3191482]: runcommand:248 runcommand called with: args = '/sbin/backup.sh 0 /altbootbank', outfile = None, returnoutput = True, timeout = 0.0.
    2025-01-09T07:52:57Z Db(15) lifecycle[3191482]: HostImage:1454 installer BootBankInstaller failed: Error in running backup script: non-zero code returned.
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: return code: 1
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: output: Bootbank lock is /var/lock/bootbank/d047f72b-d8bf5cda-xxxx-xxxxxxxxxxxx
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: Traceback (most recent call last):
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/runpy.py", line 198, in _run_module_as_main
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/runpy.py", line 88, in _run_code
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 415, in <module>
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 408, in main
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 348, in cmdlineLock
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 269, in lock
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: LockTimeoutError: Failed to claim lock file: timeout of 60 secs exceeded
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: Failed to acquire bootbank lock (status 1)
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: . Clean up the installation.
    2025-01-09T07:52:57Z In(14) lifecycle[3191482]: runcommand:248 runcommand called with: args = ['/usr/lib/vmware/vob/bin/addvob', 'vob.user.esximage.install.error', 'Error in running backup script: non-zero code returned.\nreturn code: 1\noutput: Bootbank lock is /var/lock/bootbank/d047f72b-d8bf5cda-xxxx-xxxxxxxxxxxx\nTraceback (most recent call last):\n  File "/lib64/python3.11/runpy.py", line 198, in _run_module_as_main\n  File "/lib64/python3.11/runpy.py", line 88, in _run_code\n  File "/lib64/python3.11/site-packages/lockfile.py", line 415, in <module>\n  File "/lib64/python3.11/site-packages/lockfile.py", line 408, in main\n  File "/lib64/python3.11/site-packages/lockfile.py", line 348, in cmdlineLock\n  File "/lib64/python3.11/site-packages/lockfile.py", line 269, in lock\nLockTimeoutError: Failed to claim lock file: timeout of 60 secs exceeded\nFailed to acquire bootbank lock (status 1)\n'], outfile = None, returnoutput = True, timeout = 0.0.

Resolution

  • Further analysis needs to be performed to check if and why /bootbank or /altbootbank is locked.
  • One can also perform manual backup with /sbin/backup.sh script (e.g. /sbin/backup.sh 0 /altbootbank) and see if this can be run with no issue.
  • A retry of NSX installation should succeed after making sure that /bootbank or /altbootbank is not locked.

Additional Information

If this KB did not help resolve your issue, you can review the following kb for further troubleshooting steps: Troubleshooting NSX Installation Operations