An ESXi host does not initiate a storage path failover despite path instability when the Test Unit Ready (TUR) command repeatedly returns retry requests. This behavior results in:
/var/log/vmkernel.log or hostd.log showing:SCSI_HOST_BUS_BUSY 0x02SCSI_HOST_SOFT_ERROR 0x0bSCSI_HOST_RETRY 0x0cVMK_STORAGE_RETRY_OPERATIONNMP device "naa.XXX" state in doubt; requested fast path state updateWhen a storage path experiences an issue, the ESXi host sends a TUR command to confirm the path status. If the storage array or controller returns a retry operation request (e.g., due to a busy host bus), the ESXi host continues to retry the command indefinitely instead of marking the path as dead and triggering a failover to a healthy path.
To resolve this issue, enable the action_OnRetryErrors option. This configuration allows the ESXi host to mark a problematic path as dead when retry errors occur, which triggers an immediate failover to an alternative working path.
IMPORTANT: Always test configuration changes in a non-production environment first. Verify with your storage vendor that action_OnRetryErrors is recommended for your specific array firmware.
To configure the option, perform these steps:
Enable the option for a specific device: Run the following command, replacing naa.### with your specific device identifier: # esxcli storage nmp satp generic deviceconfig set -c enable_action_OnRetryErrors -d naa.###
Verify the configuration: Check the status of the device configuration by running: # esxcli storage nmp device list -d naa.###
The output should show the option in the Config line: Config: {navireg ipfilter action_OnRetryErrors}
Disable the option (if needed): To revert the change, run: # esxcli storage nmp satp generic deviceconfig set -c disable_action_OnRetryErrors -d naa.###
Note: The enable_action_OnRetryErrors configuration is persistent across host reboots.