PSOD on ESXi hosts stating "involved in Panic Nvidia-Gpu"
search cancel

PSOD on ESXi hosts stating "involved in Panic Nvidia-Gpu"

book

Article ID: 418742

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Panic error : #######cpu#####@BlueScreen: #PF Exception 14 in world #####:vmx IP ########## addr #######

    VMware ESXi 8.0.3 [Releasebuild-24784735 x86_64]
    #PF Exception 14 in world 8775175:vmx IP 0x420035f9bead addr 0x431fecdbee18
    PTEs : 0x800007b023: 0x810ca23063: 0x0;

    Module(s) involved in panic: [nvidia-gpu 580.82.02 (External) ]
    cr0=0x80010031 cr2=0x431fecdbee18 cr3=0xa55200e000 cr4=0x142768
    FMS=06/cf/2 uCode=0x210002a9
    frame=0x453a9d19ab20 ip=0x420035f9bead err=0x0 rf lags=0x10212
    rax=0x41ffcec59e90 rbx=Oxffffffb0 rcx=Oxffffffff004000d0
    rdx=Oxffffffb0 rbp=0x453a9d19acf0 rsi=0x431eecdbee70
    rdi=0x41ffcec59e98 r8=0x431eecdbee70 r9=0x0
    r10=0x3 rl1=0x431ee3a564f0 r12=0x41ffcec59e20
    r13=0x431ee3a564f0 r14=0x15 r15=0xffffffff000000f8
    *PCPU46:8775175/vmx
    PCPU 0: VVSVVVSVVVVVVSVVVVVVVVVVVVVVVVVVSVVSSVVSVSVSVVVVVVVSVVVVV
    Code start: 0x420035e00000 VMK uptime: 32:19:51:48.707
    0x453a9d19abe8: [0x420035f9bead]_vmk_Memcpy@vmkernel#nover+0x29 stack: 0x0
    0x453a9d19abf0: [0x420037567951]1ibosExtractLogs@(nvidia-gpu)#<None>+0x632 stack: 0x420037556d9c
    0x453a9d19abf8: [0x42003656becf ]_etext@vmkernel#nover+0x84085 stack: 0x431ee3a48bc8
    base fs=0x0 gs=0x42004b800000 Kgs=0x0
    No disk partition configured to dump data.
    Finalized dump header (18/18) FileDump: Successful
    No port for remote debugger. "Escape" for local debugger.

Environment

vSphere 8.x

Cause

According to the memory dump vmk_Memcopy() was called to copy data on memory from 0x431ec777f1184294967216 (0xffffffb0) bytes, which was exceeding the mapped area in nvidiaGeneral heap. We suppose the length of byte to copy (0xffffffb0) was bad. It was passed by nvidia-gpu driver (libosExtractLogs()) 

Resolution

Engage NVIDIA.