ESXi 8.0.1 hosts may experience a Purple Screen of Death (PSOD) with a Page Fault Exception during VMFS operations. This typically occurs during intensive I/O operations on VMFS-6 volumes.
Symptoms:
1. Host experiences a PSOD with "#PF Exception 14"
2. Backtrace shows involvement of VMFS modules
3. Error occurs in the context of resource management operations (res3HelperQu or FSUnmapManag worlds)
ESXi 8.0.1 hosts may experience a Purple Screen of Death (PSOD) with a Page Fault Exception during VMFS operations. This issue manifests in several distinctive patterns that can help with identification:
Pattern 1 - Resource Helper Queue PSOD:
#PF Exception 14 in world 2102584:res3HelperQu IP 0x4200135590e2 addr 0x100000008
Key identifying stack trace elements:
#0 DLM_free (msp=..., mem=..., allowTrim=...)
#1 Heap_Free (heap=...)
#2 FS3_HeapMemFree (heapID=...)
#3 FS3_MemFree (realPtr=...)
#4 Res6NewFreeClusterEntry (rce=...)
#5 Res6NewFlushCache (resType=...)
#6 Res6FlushCacheInt (resType=...)
#7 Res6FlushCache (resType=...)
#8 Res3_FlushCachesVMFS6 (fsData=...)
#9 Res3FlushHelperVMFS6 (data=...)
Pattern 2 - FSUnmapManager PSOD:
#PF Exception 14 in world 2097977:FSUnmapManag IP 0x4200019591ac addr 0xe1800000010
Key identifying stack trace elements:
#0 DLM_free (msp=...)
#1 Heap_Free (heap=...)
#2 FS3_HeapMemFree (heapID=...)
#3 FS3_MemFree (realPtr=...)
#4 UnmapAddClustersToProcess (unmapsToProcess=...)
#5 UnmapProcessFromCluster (listHead=...)
#6 ProcessFS_Unmaps ()
#7 UnmapManager (unused=...)
Pattern 3 - File IO Related PSOD:
#PF Exception 14 in world 3189220:fssAIO IP 0x420005158097 addr 0x10
Key identifying stack trace elements:
#0 tmalloc_large (nb=8512, m=...)
#1 DLM_malloc (msp=...)
#2 Heap_AlignWithTimeoutAndRA (heap=...)
#3 FS3_HeapMemAlloc (heapID=...)
#4 FS3_MemAlloc (size=...)
#5 Res6_InitCacheEntry (txn=...)
#6 Res6GetRC (txn=...)
#7 Res6_MemLockRC (resType=...)
```
Common Diagnostic Information:
1. vmkernel.log will typically show memory access errors:
Cannot access memory at address 0xe1800000010
2. Core dump analysis often reveals heap corruption with a specific chunk size:
FREE: mchunkptr: 0x43174f3be5f0 (raw addr 0x43174f3be600); mchunk size=8512
3. Volume information can be gathered using:
vmkfstools -Ph /vmfs/volumes/<volume-name>
Example output showing affected volume:
VMFS-6.82 (Raw Major Version: 24) file system spanning 1 partitions.
Mode: public
Capacity X.X TB, Y.Y TB available, file block size 1 MB
These patterns typically appear during:
A race condition in the VMFS resource management code can lead to premature freeing of memory resources while they are still in use. This occurs specifically during cluster resource entry (RCE) operations when multiple threads are accessing the same resources.