ESXi Host PXE Boot Failure via Auto Deploy - Connection Reset
search cancel

ESXi Host PXE Boot Failure via Auto Deploy - Connection Reset

book

Article ID: 437370

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • ESXi hosts fail to initiate boot via Auto Deploy.

  • The boot process loops or halts with the error: "Could not boot: Connection reset".

  • Network errors are encountered during the PXE/iPXE phase of the boot process

  • the /var/log/vmware/rbd/rbd-cache.log will report that the path "/var/lib/rbd/cache/vital" is missing

YYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger : FileNotFoundError
YYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger : :
YYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger : [Errno 2] No such file or directory: '/var/lib/rbd/cache/vital'
YYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:vc_extension:Using the default linux install config file path
YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:vc_extension:Could not find a valid vcAddress in the config file, using the db
YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:dbsetup:installed DB connection <sqlite3.Connection object at 0x7ff007bc79d0> for thread <_MainThread(MainThread, started 140668942231360)>
YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:networkutil:The configured address: [email protected]
YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:vc_extension:Service Address (WAITER_ADDRESS) : 1##.##.##.##
YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:cacher:BEGIN validating cache contents
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:rbd-cached:Program exiting due to unknown exception
Traceback (most recent call last):
  File "/usr/bin/rbd-cached", line 160, in <module>
    sys.exit(main(sys.argv))
  File "/usr/bin/rbd-cached", line 25, in main
    return rbd_cached.main(args)
  File "/var/lib/rbd/bin/rbd_cached.py", line 31, in main
    cache1 = cacher.Cacher(cacheDir, cacheSize)
  File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 332, in __init__
  File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 380, in validateCache
  File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 567, in _ensureDirectoriesExist
  File "/usr/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/rbd/cache/vital'
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : Traceback (most recent call last):
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "/usr/bin/rbd-cached", line 160, in <module>
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : sys.exit(main(sys.argv))
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "/usr/bin/rbd-cached", line 25, in main
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : return rbd_cached.main(args)
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "/var/lib/rbd/bin/rbd_cached.py", line 31, in main
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : cache1 = cacher.Cacher(cacheDir, cacheSize)
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 332, in __init__
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 380, in validateCache
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 567, in _ensureDirectoriesExist
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :   File "/usr/lib/python3.7/os.py", line 221, in makedirs
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : mkdir(name, mode)
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : FileNotFoundError
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : :
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : [Errno 2] No such file or directory: '/var/lib/rbd/cache/vital'
YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :
YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:vc_extension:Using the default linux install config file path
YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:vc_extension:Could not find a valid vcAddress in the config file, using the db
YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:dbsetup:installed DB connection <sqlite3.Connection object at 0x7fbf8ad409d0> for thread <_MainThread(MainThread, started 140460688193344)>
YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:networkutil:The configured address: [email protected]
YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:vc_extension:Service Address (WAITER_ADDRESS) : 1##.##.##.##
YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:cacher:BEGIN validating cache contents
YYYY:MM:DDTHH:MM:SS.Z [12184]ERROR:rbd-cached:Program exiting due to unknown exception

Environment

  • VMware Cloud Foundation (VCF) 4.x / 5.x

  • vSphere Foundation (VVF) 7.x / 8.x

  • vCenter Server Appliance (VCSA)

Cause

This issue occurs if the symbolic link from /var/lib/rbd/cache to the /storage partition is removed or if the subdirectory is deleted by the user, leading to a service failure of the vmware-rbd-watchdog.

Resolution

To resolve this issue, the missing directory structure and symbolic links must be manually restored on the vCenter Server Appliance.

  1. Access the VCSA: Log into the vCenter Server via SSH as root.

  2. Verify Path State: Check the status of the rbd (Remote Boot Daemon) directory:

    ls -ltrh /var/lib/rbd
    
  3. Restore Symbolic Link: If the cache link is missing or broken, recreate it pointing to the persistent storage partition:

    ln -s /storage/autodeploy/cache /var/lib/rbd/cache
    
  4. Recreate Vital Directory: Create the specific directory required by the rbd-cached service:

    mkdir /storage/autodeploy/cache/vital 
  5. Set Permissions: Ensure the deploy user has ownership of the new directory:

    chmod 770 /storage/autodeploy/cache/vital
    chown deploy:deploy /storage/autodeploy/cache/vital
  6. Restart Service: Restart the Auto Deploy watchdog service to initialize the cache:

    service-control --restart vmware-rbd-watchdog