ESXi hosts become unresponsive - SCSI Persistent Reservation Failure

Products

VMware vSphere ESXi

Issue/Introduction

A Reservation Conflict generally means that the host initiator that is trying to access the LUN is not allowed access because some other initiator has reserved the LUN. Persistent Reservations (SCSI-3) is a storage protocol feature that allows multiple hosts to manage access to shared block storage (LUNs) in clustered environments. Unlike older reservations (SCSI-1/2), these persist across SCSI bus resets, enabling a surviving host to "preempt" or take over locks from a failed host, ensuring data consistency and high availability.

How SCSI-3 Works ?

SCSI-3 PR is a cluster-aware locking mechanism. Unlike older SCSI-1/2 "Reserve/Release" commands—which locked the entire bus and were lost if a host reset—SCSI-3 PR uses Registrations and Reservations. Multiple hosts can register their keys, but only those with the correct rights can reserve the data. These locks survive host reboots and bus resets, allowing a surviving node to "preempt" a failed node’s lock to maintain high availability without data inconsistencies.

Environment

VMware ESXi 8.0.3

Cause

This error occurs when an ESXi host attempts a SCSI-3 Persistent Reservation (PR) command on a specific LUN, but the operation fails. While a "Reservation Conflict" means another host has a lock, an "I/O Error" typically means the command itself was never successfully processed by the storage target due to a transport failure or a busy controller.

SCSI reservations are the locking mechanism VMware uses to prevent multiple hosts from writing to the same block of storage at the same time and corrupting your data. When you see an I/O error, it means the host sent the lock request, but the storage network or the array itself dropped the request or returned an error.

Log Messages from Storage Array :

2026-02-20 17:01:50.35 GMT     1 54307799 Internal Communication Informational   Scsi reserve/release persistent event undefined  General         SERVICE (REGISTER-IGNORE) LUN (#####AC000000000000000##########) host (200##########32C) key (0000000000000000) svkey (000000######A0DE) port (1:#:4)
2026-02-20 17:01:50.36 GMT     2 49781270 Internal Communication Informational   Scsi reserve/release persistent event undefined  General         SERVICE (REGISTER-IGNORE) LUN (#####AC000000000000000##########) host (200##########323) key (0000000000000000) svkey (00000025B5A1A044) port (2:#:1)
2026-02-20 17:01:50.30 GMT     3 54613794 Internal Communication Informational   Scsi reserve/release persistent event undefined  General         SERVICE (REGISTER-IGNORE) LUN (#####AC000000000000000##########) host (200##########337) key (0000000000000000) svkey (000000#####1A0F0) port (3:#:1)
2026-02-20 17:01:50.30 GMT     3 54613795 Internal Communication Informational   Scsi reserve/release persistent event undefined  General         SERVICE (REGISTER-IGNORE) LUN (#####AC000000000000000##########) host (200##########337) key (0000000000000000) svkey (00000######1A0F0) port (3:#:1)
2026-02-20 17:01:50.41 GMT     0 62214392 Internal Communication Informational   Scsi reserve/release persistent event undefined  General         SERVICE (REGISTER-IGNORE) LUN (#####AC000000000000000##########) host (200##########32B) key (0000000000000000) svkey (00000######1A0DA) port (0:#:2)

VMkernel.log:

2026-02-20T02:39:10.114Z In(182) vmkernel: cpu3:2099273)FDS: 1111: Persistent register failed on naa.#######0000000000000007#########:1; status = I/O error
2026-02-20T02:39:10.124Z In(182) vmkernel: cpu3:2099273)FDS: 1111: Persistent register failed on naa.#######0000000000000007#########:1; status = I/O error
2026-02-20T02:39:10.135Z In(182) vmkernel: cpu3:2099273)FDS: 1111: Persistent register failed on naa.#######0000000000000007#########:1; status = I/O error
2026-02-20T02:39:10.143Z In(182) vmkernel: cpu3:2099273)FDS: 1111: Persistent register failed on naa.#######0000000000000007#########:1; status = I/O error

2026-02-20T02:38:57.903Z In(182) vmkernel: cpu38:2098179)ScsiDeviceIO: 4686: Cmd(0x45d9e7a3c080) 0x5f, CmdSN 0x5211 from world 0 to dev "naa.#####2ac00000000000000070000####" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x55 0x4
2026-02-20T02:38:58.044Z In(182) vmkernel: cpu38:2098179)ScsiDeviceIO: 4686: Cmd(0x45d9e7a3c080) 0x5f, CmdSN 0x5278 from world 0 to dev "naa.#####2ac00000000000000070000####" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x55 0x4
2026-02-20T02:38:58.055Z In(182) vmkernel: cpu52:2098180)ScsiDeviceIO: 4686: Cmd(0x45d9fd1cc340) 0x5f, CmdSN 0x527f from world 0 to dev "naa.#####2ac00000000000000070000####" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x55 0x4
2026-02-20T02:38:58.070Z In(182) vmkernel: cpu38:2098179)ScsiDeviceIO: 4686: Cmd(0x45d9fd1e3740) 0x5f, CmdSN 0x528c from world 0 to dev "naa.#####2ac00000000000000070000####" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x55 0x4

Device Status	[0x2]	CHECK_CONDITION	This status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Additional Sense Data	55/04		INSUFFICIENT REGISTRATION RESOURCES

Resolution

1. Try clearing the reservation via ESXi

To reset the reservation on a specific device run this command

COMMAND: esxcli storage core device reservation set --action=reset --device <naa.ID>

WARNING: Manually clearing reservations while hosts are actively writing can cause severe data inconsistencies

2. Storage Vendor Intervention

If the esxcli commands fail to clear the lock, involve your storage vendor. The array vendors (Pure, Dell, NetApp, HPE, Hitachi) can "Clear SCSI Reservations" for a specific Volume/LUN without affecting the rest of the storage or requiring a host reboot.

3. Last Resort: Rolling Reboot

If your storage vendor is unable to clear reservations on the LUN is clear but the ESXi hosts still report I/O errors, perform a Rolling Reboot. Reboot one host at a time after migrating its VMs.