Remote Boot Device Failure Monitoring in vSphere 9.0
search cancel

Remote Boot Device Failure Monitoring in vSphere 9.0

book

Article ID: 380716

calendar_today

Updated On: 06-18-2025

Products

VMware vSphere ESXi

Issue/Introduction

vSphere 9.0 adds the feature of monitoring of a remote boot device containing ESX-OSData partitions for critical failures. vSphere expects such a remote boot device to have high availability but the device can still fail due to various reasons such as 'All Paths Down' or 'Permanent device loss' etc. For a remote boot device, these situations are continuously monitored in ESX. When such situations occur and if the device fails to recover within a certain interval, this is a critical error. ESX host will be halted so as to avoid running it in an failed state and avoid corruption. In addition, the vCenter Server may receive VMkernel Observations (VOBs) system events for these failures. The following failure scenarios are monitored by vSphere.

  • All Paths Down (APD)

    An All-Paths-Down (APD) situation occurs when all paths to the boot device are down. This situation begins with a start of the 'All-Paths-Down' event. The  ESX host enters a timeout period (140 secs default) and keeps reattempting to establish connectivity to the boot device. When this timeout period ends and the boot device failed to recover, an 'All Paths Down' timeout event occurs. Once the timeout event occurs, the ESX host will be halted with a purple diagnostic screen as seen below,

**CRITICAL**: Lost access to boot device 'eui.xxxxxxx - All paths are down.

Module(s) involved in panic: [bootdevmon Built on:...]

  • The vCenter Server may receive the following VOB events from the ESX host:

    Event Message Event Type Event ID Note
    Boot Device with identifier 'eui.xxxxxxxxxxxx' has entered the state: All Paths Down. Host will be halted if the boot device fails to recover in 160 seconds. warning esx.problem.bootdevice.apd.start  
    Boot Device with identifier 'eui.xxxxxxxxxxxx' has entered the state: All Paths Down Timeout. Host will be halted if the boot device fails to recover in 20 seconds. error esx.problem.bootdevice.apd.timeout  
    Boot Device with identifier 'eui.xxxxxxxxxxxx' has exited from the state: All Paths Down. info esx.clear.bootdevice.apd.exit If the boot device is recovered from APD state in 160 seconds.


  • Permanent Device Loss (PDL)

    A storage device is considered to be in the permanent device loss (PDL) state when it becomes permanently unavailable to your ESX host. Typically, the PDL condition occurs when a device is unintentionally removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error. When a PDL occurs and the boot device fails to recover, the ESX host will be halted with the below purple diagnostic screen.

    **CRITICAL**: Lost access to boot device 'eui.xxxxxxxxxxxx' - Permanent device loss.

    Module(s) involved in panic: [bootdevmon Built on:...]

    The vCenter Server may receive the following VOB events from the ESX host:

    Event Message Event Type Event ID Note
    Boot Device with identifier 'eui.xxxxxxxxxxxx' has entered the  state: Permanent Device Loss. Host will be halted if the boot device fails to recover in 20 seconds. error esx.problem.bootdevice.pdl  
    Boot Device with identifier 'eui.xxxxxxxxxxxx' is accessible again. info esx.clear.bootdevice.pdl.restored If the boot device is recovered from PDL state in 20 seconds.


  • Remote boot device loss on ESX boot.

    During boot, ESX checks the availability of the system storage in the boot device (local or remote). If for any reason the boot device is not found or inaccessible, ESX will stop proceeding with the boot and display a purple diagnostic screen as seen below if the host was not in maintenance mode before reboot.

    The system has found a problem on your machine and cannot continue.

    Unable to find boot device: '<Device ID>'. 

    Stay in the maintenance mode if ESX was in the maintenance mode before reboot. On ESX host, there is a SysAlert "Failed to find boot device after 120 seconds.". On vSphere Client, the ESX host is kept into maintenance mode, and "Exit maintenance mode" operation will fail with an error “A general system error occurred: Cannot exit maintenance mode due to failure during boot. A critical failure was detected during system boot. The host is currently not able to exit maintenance mode and run workloads. Refer to VMware KB93107 for details”.

Environment

ESX 9.0

Cause

This problem occurs when the boot device is inaccessible during ESX boot. This can occur for various reasons such as permanent device loss, misconfiguration of ESX network/storage settings, connectivity issues with the fabric, or problems with the Storage Array.

Resolution

To resolve this issue, identify and resolve the cause for the Storage connectivity failure, such as Storage array, SAN switch, Device failure, etc.

It may be necessary to temporarily disable boot device monitoring for investigating any host side issues and you may follow below steps to disable it temporarily :

Note: For debugging boot device issues, the boot device monitoring can be disabled as follows. Once the issue is resolved, the boot device monitoring should be enabled.


Temporary disabling:

By adding the boot option 'haltOnBootDeviceLoss=FALSE' as follows,

  • Start the host.
  • When the ESX boot loader window appears, press Shift+O to edit boot options.
  • Add the text haltOnBootDeviceLoss=FALSE.
  • Hit <Enter> to proceed with the boot.

    Once the boot device monitoring is disabled. vCenter Server may receive the following VOB event from the ESX host when the host is booting.

    Event Message Event Type Event ID
    Host is not in compliance, remote boot device monitoring is disabled. warning esx.problem.bootdevice.monitor.disabled


  • Debug the issue
  • Reboot the host to enable boot device monitoring again.

Persistent disabling (Persistent when boot device is accessible):

  • Put the host in maintenance
  • SSH to the host and execute following esxcli command

    # esxcli system settings kernel set -s haltOnBootDeviceLoss -v FALSE

  • Reboot the host

    Once the boot device monitoring is disabled. vCenter Server may receive the following VOB event from the ESX host when the host is booting.

    Event Message Event Type Event ID
    Host is not in compliance, remote boot device monitoring is disabled. warning esx.problem.bootdevice.monitor.disabled


  • Debug issue.
  • Re-enable boot device failure monitoring (Persistent when boot device is accessible).

    # esxcli system settings kernel set -s haltOnBootDeviceLoss -v TRUE

  • Reboot the host