ESXi host experiences a Purple Screen of Death (PSOD) with an error message similar to: @BlueScreen: #PF Exception 14 in world 2098009:OCFlush IP 0x4200389489c4 addr 0x350
The back trace may include references to VMFS, LVM, or OCFlush modules.
PSOD Screenshot for reference:
VMware ESXi 7.x / 8.x.
This issue is caused by a race condition within the ESXi storage stack (specifically the VMFS/LVM layer) during an All Paths Down (APD) recovery sequence.
When a storage device experiences a transient disconnect (APD), and subsequently recovers (APD Exit), a timing discrepancy can occur if the backing storage device is unregistered or remapped at the array level while the ESXi host still maintains an active, open reference to the VMFS volume.
The kernel clears the internal volume reference, but subsequent metadata synchronization operations (such as OCFlush) attempt to access this null or invalid memory address, resulting in a Page Fault (Exception 14) and system halt.
To prevent this condition, ensure storage orchestration operations are coordinated with the ESXi host state:
Quiesce I/O: Before performing storage-layer failover, failback, or LUN remapping, ensure all Virtual Machines on the affected datastore are migrated (vMotion) or powered off.
Maintenance Mode: Place the ESXi host in Maintenance Mode if global storage changes are being applied to ensure no active handles remain on the volumes.
Unmount Volumes: Properly unmount VMFS datastores and detach the underlying devices from the ESXi hosts before removing LUN masking or changing replication states on the storage array.
Coordination: Review Site Recovery Manager (SRM) or third-party replication scripts to ensure a delay is introduced between "Device Unregistration" and "Volume Teardown" to allow the ESXi storage stack to update its object state.