ESXi host PSOD : PF exception 14 at address 0x0 running on Nvidia M10 GPU with older 390.* vib
search cancel

ESXi host PSOD : PF exception 14 at address 0x0 running on Nvidia M10 GPU with older 390.* vib

book

Article ID: 318446

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To resolve the PSOD.

Symptoms:
  • The VMs which has dedicated vGPU fails with crash dump.
  • The end users notice they got disconnected while using the VM.
  • ESXi Host may PSOD  when deploying vSGA enabled linked clones desktop pool in Horizon View
       The backtrace will be similar to below
2019-10-07T18:05:12.458Z cpu1:2101958)@BlueScreen: #PF Exception 14 in world 2101958:hostd-IO IP 0x4180284fa6de addr 0x45026c716000
PTEs:0x9040057023;0x8031176063;0x8048a0f063;0x0;
2019-10-07T18:05:12.458Z cpu1:2101958)Code start: 0x418027800000 VMK uptime: 60:16:15:47.722
2019-10-07T18:05:12.459Z cpu1:2101958)0x451b3431b9e0:[0x4180284fa6de]_nv010143rm@(nvidia)#<None>+0xb2 stack: 0x451b3431ba60
2019-10-07T18:05:12.459Z cpu1:2101958)0x451b3431ba30:[0x4180285abca6]_nv012468rm@(nvidia)#<None>+0xcf stack: 0x451b3431bc20
2019-10-07T18:05:12.459Z cpu1:2101958)0x451b3431ba70:[0x4180286b97d7]_nv017782rm@(nvidia)#<None>+0x4d0 stack: 0x451b3431ba98
2019-10-07T18:05:12.459Z cpu1:2101958)0x451b3431bae0:[0x418028944be4]_nv021250rm@(nvidia)#<None>+0x365 stack: 0x451b3431bb3c
2019-10-07T18:05:12.460Z cpu1:2101958)0x451b3431bd20:[0x41802894505d]_nv021218rm@(nvidia)#<None>+0x8a stack: 0x0
2019-10-07T18:05:12.460Z cpu1:2101958)0x451b3431bd60:[0x41802894ba86]rm_isr@(nvidia)#<None>+0x4b stack: 0x451b3431bdb0
2019-10-07T18:05:12.460Z cpu1:2101958)0x451b3431bd80:[0x4180289af0c0]nv_interrupt_handler@(nvidia)#<None>+0xdd stack: 0x8a
2019-10-07T18:05:12.460Z cpu1:2101958)0x451b3431bdc0:[0x4180278ef47b]IntrCookieBH@vmkernel#nover+0x1e0 stack: 0x0
2019-10-07T18:05:12.461Z cpu1:2101958)0x451b3431be60:[0x4180278cd4ef]BH_DrainAndDisableInterrupts@vmkernel#nover+0x100 stack: 0x451b3431bf30
2019-10-07T18:05:12.461Z cpu1:2101958)0x451b3431bef0:[0x4180278f0e7a]IntrCookie_VmkernelInterrupt@vmkernel#nover+0xb3 stack: 0x44
2019-10-07T18:05:12.461Z cpu1:2101958)0x451b3431bf10:[0x41802794731c]IDT_IntrHandler@vmkernel#nover+0x9d stack: 0x0
2019-10-07T18:05:12.461Z cpu1:2101958)0x451b3431bf30:[0x418027963066]gate_entry@vmkernel#nover+0x67 stack: 0x0
  • Logging leading up to crash in the vmkernel.log will be similar to below
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000e0 count was zero (count=0)
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000e1 count was zero (count=1)
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000e2 count was zero (count=2)
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000e3 count was zero (count=3)
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000e9 count was zero (count=9)
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000f3 count was zero (count=19)
2019-10-07T18:05:11.980Z cpu73:2097567)WARNING: VmMemPin: 333: vm 3591873: bpn 0x1000000fd count was zero (count=29)
2019-10-07T18:05:12.084Z cpu73:2097567)WARNING: PFrame: vm 3591873: 2489: Deallocating pinned bpn 0x1001dafab, pinCount 1 throttle 0.

 

 


Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment.
Disclaimer:VMware is not responsiblee for the reliability of any data,opinions,advice or statements made on third-party websites.Inclusion of such links does not imply that VMware endorses,recommends or accepts any

Environment

VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5

Cause

This is caused due to a heap corruption in driver.


Resolution

This is a known issue with Nvidia driver  and fixed in version 390.93 vib.

To check the current version installed run command esxcli software vib list

For more information refer to :Nvidia Grid release notes

Nvidia reports this issue is fixed in vib 390.93 or later.
Nvidia recommends to update to Grid 8.3, or 10.0 to resolve this issue.


Workaround:
None

Additional Information

Impact/Risks:
None