ESXi hosts fail to initiate boot via Auto Deploy.
The boot process loops or halts with the error: "Could not boot: Connection reset".
Network errors are encountered during the PXE/iPXE phase of the boot process
/var/log/vmware/rbd/rbd-cache.log will report that the path "/var/lib/rbd/cache/vital" is missingYYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger : FileNotFoundErrorYYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger : :YYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger : [Errno 2] No such file or directory: '/var/lib/rbd/cache/vital'YYYY:MM:DDTHH:MM:SS.Z [10882]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:vc_extension:Using the default linux install config file pathYYYY:MM:DDTHH:MM:SS.Z [11243]INFO:vc_extension:Could not find a valid vcAddress in the config file, using the dbYYYY:MM:DDTHH:MM:SS.Z [11243]INFO:dbsetup:installed DB connection <sqlite3.Connection object at 0x7ff007bc79d0> for thread <_MainThread(MainThread, started 140668942231360)>YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:networkutil:The configured address: [email protected]YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:vc_extension:Service Address (WAITER_ADDRESS) : 1##.##.##.##YYYY:MM:DDTHH:MM:SS.Z [11243]INFO:cacher:BEGIN validating cache contentsYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:rbd-cached:Program exiting due to unknown exceptionTraceback (most recent call last): File "/usr/bin/rbd-cached", line 160, in <module> sys.exit(main(sys.argv)) File "/usr/bin/rbd-cached", line 25, in main return rbd_cached.main(args) File "/var/lib/rbd/bin/rbd_cached.py", line 31, in main cache1 = cacher.Cacher(cacheDir, cacheSize) File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 332, in __init__ File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 380, in validateCache File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 567, in _ensureDirectoriesExist File "/usr/lib/python3.7/os.py", line 221, in makedirs mkdir(name, mode)FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/rbd/cache/vital'YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : Traceback (most recent call last):YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "/usr/bin/rbd-cached", line 160, in <module>YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : sys.exit(main(sys.argv))YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "/usr/bin/rbd-cached", line 25, in mainYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : return rbd_cached.main(args)YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "/var/lib/rbd/bin/rbd_cached.py", line 31, in mainYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : cache1 = cacher.Cacher(cacheDir, cacheSize)YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 332, in __init__YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 380, in validateCacheYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/waiter/cacher.py", line 567, in _ensureDirectoriesExistYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : File "/usr/lib/python3.7/os.py", line 221, in makedirsYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : mkdir(name, mode)YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : FileNotFoundErrorYYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : :YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger : [Errno 2] No such file or directory: '/var/lib/rbd/cache/vital'YYYY:MM:DDTHH:MM:SS.Z [11243]ERROR:logutil:IOLogger :YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:vc_extension:Using the default linux install config file pathYYYY:MM:DDTHH:MM:SS.Z [12184]INFO:vc_extension:Could not find a valid vcAddress in the config file, using the dbYYYY:MM:DDTHH:MM:SS.Z [12184]INFO:dbsetup:installed DB connection <sqlite3.Connection object at 0x7fbf8ad409d0> for thread <_MainThread(MainThread, started 140460688193344)>YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:networkutil:The configured address: [email protected]YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:vc_extension:Service Address (WAITER_ADDRESS) : 1##.##.##.##YYYY:MM:DDTHH:MM:SS.Z [12184]INFO:cacher:BEGIN validating cache contentsYYYY:MM:DDTHH:MM:SS.Z [12184]ERROR:rbd-cached:Program exiting due to unknown exception
VMware Cloud Foundation (VCF) 4.x / 5.x
vSphere Foundation (VVF) 7.x / 8.x
vCenter Server Appliance (VCSA)
This issue occurs if the symbolic link from /var/lib/rbd/cache to the /storage partition is removed or if the subdirectory is deleted by the user, leading to a service failure of the vmware-rbd-watchdog.
To resolve this issue, the missing directory structure and symbolic links must be manually restored on the vCenter Server Appliance.
Access the VCSA: Log into the vCenter Server via SSH as root.
Verify Path State: Check the status of the rbd (Remote Boot Daemon) directory:
ls -ltrh /var/lib/rbd
Restore Symbolic Link: If the cache link is missing or broken, recreate it pointing to the persistent storage partition:
ln -s /storage/autodeploy/cache /var/lib/rbd/cache
Recreate Vital Directory: Create the specific directory required by the rbd-cached service:
mkdir /storage/autodeploy/cache/vital Set Permissions: Ensure the deploy user has ownership of the new directory:
chmod 770/storage/autodeploy/cache/vitalchown deploy:deploy/storage/autodeploy/cache/vital
Restart Service: Restart the Auto Deploy watchdog service to initialize the cache:
service-control --restart vmware-rbd-watchdog