Prepare ESXi host for NSX-T failed with "An internal error occurred while staging/remediating the host"
search cancel

Prepare ESXi host for NSX-T failed with "An internal error occurred while staging/remediating the host"

book

Article ID: 388598

calendar_today

Updated On:

Products

VMware NSX-T Data Center VMware vCenter Server 8.0 VMware vCenter Server

Issue/Introduction

  • The cluster is a vLCM cluster.
  • Prepare vLCM cluster for NSX-T failed with "Solution apply failed on host: '<hostname>' An internal error occurred while staging/remediating the host."
  • As NSX-T installation for vCLM cluster is performed serially, the installation on the rest of the hosts in the cluster is skipped.

 

Environment

VMware NSX-T Data Center

VMware vCenter Server 8.0

Cause

NSX-T installation failed on ESXi host because the ESXi host failed to perform config backup after installing NSX-T vibs.

ESXi host is supposed to perform backup with /sbin/backup.sh script after installing new vibs to ensure the config change persists across reboots.

 

Issue can be identified as followed.

  • lifecycle.log indicates the NSX-T vibs were successfully installed on ESXi host:

    2025-01-09T07:49:50Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-exporter-4.1.2.1.0-8.0.22667792
    2025-01-09T07:49:51Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-monitoring-4.1.2.1.0-8.0.22667792

    ......

    2025-01-09T07:50:19Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-ids-4.1.2.1.0-8.0.22667792
    2025-01-09T07:50:20Z Db(15) lifecycle[3191482]: InstallerCommon:1408 Live installing nsx-mpa-4.1.2.1.0-8.0.22667792

  • However, ESXi host failed to perform config backup after installing the vibs. As a result this caused the NSX-T installation to pause and fail:
    2025-01-09T07:51:54Z In(14) lifecycle[3191482]: BootBankInstaller:336 Copying bootbank /usr/lib/vmware/lifecycle/stagebootbank to /altbootbank
    2025-01-09T07:51:57Z In(14) lifecycle[3191482]: runcommand:248 runcommand called with: args = '/sbin/backup.sh 0 /altbootbank', outfile = None, returnoutput = True, timeout = 0.0.
    2025-01-09T07:52:57Z Db(15) lifecycle[3191482]: HostImage:1454 installer BootBankInstaller failed: Error in running backup script: non-zero code returned.
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: return code: 1
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: output: Bootbank lock is /var/lock/bootbank/d047f72b-d8bf5cda-069b-96c5e8aa9bed
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: Traceback (most recent call last):
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/runpy.py", line 198, in _run_module_as_main
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/runpy.py", line 88, in _run_code
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 415, in <module>
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 408, in main
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 348, in cmdlineLock
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]:   File "/lib64/python3.11/site-packages/lockfile.py", line 269, in lock
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: LockTimeoutError: Failed to claim lock file: timeout of 60 secs exceeded
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: Failed to acquire bootbank lock (status 1)
    2025-01-09T07:52:57Z Db(15)[+] lifecycle[3191482]: . Clean up the installation.
    2025-01-09T07:52:57Z In(14) lifecycle[3191482]: runcommand:248 runcommand called with: args = ['/usr/lib/vmware/vob/bin/addvob', 'vob.user.esximage.install.error', 'Error in running backup script: non-zero code returned.\nreturn code: 1\noutput: Bootbank lock is /var/lock/bootbank/d8bf5cda-d047f72b-069b-96c5e8aa9bed\nTraceback (most recent call last):\n  File "/lib64/python3.11/runpy.py", line 198, in _run_module_as_main\n  File "/lib64/python3.11/runpy.py", line 88, in _run_code\n  File "/lib64/python3.11/site-packages/lockfile.py", line 415, in <module>\n  File "/lib64/python3.11/site-packages/lockfile.py", line 408, in main\n  File "/lib64/python3.11/site-packages/lockfile.py", line 348, in cmdlineLock\n  File "/lib64/python3.11/site-packages/lockfile.py", line 269, in lock\nLockTimeoutError: Failed to claim lock file: timeout of 60 secs exceeded\nFailed to acquire bootbank lock (status 1)\n'], outfile = None, returnoutput = True, timeout = 0.0.

 

 

 

Resolution

Further analysis needs to be performed to check if and why /bootbank or /altbootbank is locked.

One can also perform manual backup with /sbin/backup.sh script (e.g. /sbin/backup.sh 0 /altbootbank) and see if this can be run with no issue.

A retry of NSX-T installation should succeed after making sure that /bootbank or /altbootbank is not locked.