ESXi PSOD #PF Exception 14 in OCFlush during Storage Replication Operations.
search cancel

ESXi PSOD #PF Exception 14 in OCFlush during Storage Replication Operations.

book

Article ID: 437318

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi host experiences a Purple Screen of Death (PSOD) with an error message similar to: @BlueScreen: #PF Exception 14 in world 2098009:OCFlush IP 0x4200389489c4 addr 0x350

The back trace may include references to VMFS, LVM, or OCFlush modules.

PSOD Screenshot for reference:

Environment

VMware ESXi 7.x / 8.x.

Cause

This issue is caused by a race condition within the ESXi storage stack (specifically the VMFS/LVM layer) during an All Paths Down (APD) recovery sequence.

When a storage device experiences a transient disconnect (APD), and subsequently recovers (APD Exit), a timing discrepancy can occur if the backing storage device is unregistered or remapped at the array level while the ESXi host still maintains an active, open reference to the VMFS volume.

The kernel clears the internal volume reference, but subsequent metadata synchronization operations (such as OCFlush) attempt to access this null or invalid memory address, resulting in a Page Fault (Exception 14) and system halt.

Resolution

To prevent this condition, ensure storage orchestration operations are coordinated with the ESXi host state:

  1. Quiesce I/O: Before performing storage-layer failover, failback, or LUN remapping, ensure all Virtual Machines on the affected datastore are migrated (vMotion) or powered off.

  2. Maintenance Mode: Place the ESXi host in Maintenance Mode if global storage changes are being applied to ensure no active handles remain on the volumes.

  3. Unmount Volumes: Properly unmount VMFS datastores and detach the underlying devices from the ESXi hosts before removing LUN masking or changing replication states on the storage array.

  4. Coordination: Review Site Recovery Manager (SRM) or third-party replication scripts to ensure a delay is introduced between "Device Unregistration" and "Volume Teardown" to allow the ESXi storage stack to update its object state.

Additional Information

Understanding All Paths Down (APD) and Permanent Device Loss (PDL)

Interpreting an ESXi/ESX host PSOD