ESXi Purple Screen of Death (PSOD) caused by nfcDiskCopyH Exception 13
search cancel

ESXi Purple Screen of Death (PSOD) caused by nfcDiskCopyH Exception 13

book

Article ID: 425902

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

  • The PSOD screen displays a General Protection Fault (GP Exception 13).
  • The failing world/thread is identified as nfcDiskCopyH.

 ESXi host crash with PSOD - #GP Exception 13 in world 7468577:nfcDiskCopyH @ 0x############ (memory address)

/var/run/log/LogEFI:

[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)ESC[45mESC[33;1mVMware ESXi 8.0.3 [Releasebuild-24022510 x86_64]ESC[0m
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI[2099127]: #GP Exception 13 in world 7468577:nfcDiskCopyH @ 0x############ (memory address)
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI[2099127]:
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI[2099127]: Module(s) involved in panic: [nfsclient Built on: MM DD YYYY] [sunrpc Built on: MM DD YYYY] [migrate vmotion protocol: 5.3 vmk interface: 3.0 Built on: MM DD YYYY:HH:MM:SS]
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)cr0=0x8001003d cr2=0x4a29602d6f cr3=0x20a000 cr4=0x14216c
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)FMS=06/cf/2 uCode=0x21000291
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)frame=0x4539eef9bae0 ip=0x42002836e274 err=0x0 rflags=0x10202
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)rax=0x7 rbx=0x43225941e570 rcx=0x5940
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)rdx=0x666e494f49535d0e rbp=0x7 rsi=0x420041800000
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)rdi=0x5941 r8=0x420041800000 r9=0x0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)r10=0x0 r11=0x45ba440e4dd8 r12=0x4200418016c0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)r13=0x45ba440e4dd8 r14=0x10000 r15=0x4322594034bc
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI[2099127]: *PCPU6:7468577/nfcDiskCopyHelper2-########-####-####-####-###########
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI[2099127]: PCPU  0: VVVSSSSSVSSVVUUVUVSSSSSVSVSSSUVSUVISSSSVVSVSSSVSSSUIUISSUSVISSSS
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI[2099127]: PCPU 64: SIUSSIIISSVISUUUSIISISSUSIIIISIS
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)Code start: 0x420028200000 VMK uptime: 37:01:48:18.381
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bba0:[0x42002836e274]MCSLockWait@vmkernel#nover+0x13c stack: 0x0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bbc0:[0x42002836e82d]MCSLockWork@vmkernel#nover+0x2a stack: 0x432080000001
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bbd0:[0x420029a88a03]NFSSchedRemoveRequest@(nfsclient)#<None>+0x148 stack: 0x4321f3c018f8
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bc50:[0x420029a58aca][email protected]#1+0x67 stack: 0x1
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bc70:[0x420029a59fc2][email protected]#1+0x403 stack: 0x4321f3c018a8
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bcf0:[0x420029a5a39e][email protected]#1+0x1d7 stack: 0x45ba0087f018
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bd40:[0x420029a7f020]NFSFileIOSend@(nfsclient)#<None>+0x451 stack: 0x43225943c1d0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bdb0:[0x420029a80a08]NFS_FileIO@(nfsclient)#<None>+0x525 stack: 0x25a
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9be20:[0x420029a80e37]NFSOpFileIO@(nfsclient)#<None>+0x1e8 stack: 0x2710
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bed0:[0x420028241ff4]FSSVec_FileIO@vmkernel#nover+0x21 stack: 0x198a7617
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bef0:[0x4200288bc52b]FSSFileIO@vmkernel#nover+0x17c stack: 0x465fc0189770
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bf50:[0x4200288bc6f5]FSS_AsyncFileIO@vmkernel#nover+0xe stack: 0x465fc019a230
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bf70:[0x420029be086a]NfcDiskCopyIssueAsyncRead@esx#nover+0x63 stack: 0x200000
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bf90:[0x420029be09e9]NfcDiskCopy_Helper@esx#nover+0xba stack: 0x0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9bfe0:[0x4200288d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)0x4539eef9c000:[0x420028344c6f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
[yyyy-mm-ddThh:mm:ss] In(14) LogEFI: cpu6:7468577)base fs=0x0 gs=0x420041800000 Kgs=0x0

Environment

  • VMware vSphere ESXi 8.0 Update 3b

Cause

  • This issue occurs due to a race condition in the ESXi kernel storage stack during Network File Copy (NFC) operations (such as cloning, migrating, or backing up virtual machines).
  • According to memory dump analysis, the crash is triggered by a "Use-After-Free" scenario:
    • The nfcDiskCopyH world initiates an asynchronous read IO request using the function NfcDiskCopyIssueAsyncRead().
    • Simultaneously, a separate thread closes and frees the target file object before the read operation completes.
    • When the read operation attempts to access the file, it references an entry in the fsFileHandleTable[] which is now NULL.
    • The attempt to access this invalid memory address triggers a General Protection (GP) Exception 13, resulting in a host crash.

Resolution

This issue is resolved in VMware ESXi 8.0 Update 3b (Release 24280767) and later versions.

  • To fix this issue permanently, recommend updating the affected ESXi hosts to version 8.0 U3b or higher.