ESXi 7.0 U1 host gives PSOD during View Instant clone reprovision operation on a scale of 2000 VMs with NFSv4.1
search cancel

ESXi 7.0 U1 host gives PSOD during View Instant clone reprovision operation on a scale of 2000 VMs with NFSv4.1

book

Article ID: 367534

calendar_today

Updated On:

Products

VMware vSphere ESX 7.x VMware vSphere ESXi

Issue/Introduction

The purpose of this article is to avoid the PSOD.

 

You see below issue on a ESXi 7.0 U1 host with NFS4.1 storage:

  • ESXi host goes to PSOD when under load of 2000 VM Instant clone pool reprovision operation.
  • You will see backtrace similar to the below

 2020-10-07T13:07:09.864Z cpu36:1001599386)Backtrace for current CPU #36, worldID=1001599386, fp=0x430874608c80
2020-10-07T13:07:09.874Z cpu36:1001599386)0x4538da49aeb0:[0x420017c7c1d2]LibAIODoAsyncIO@vmkernel#nover+0x12a stack: 0x0, 0x0, 0x2710, 0x4308745d4be8, 0x4308745d4ba0
2020-10-07T13:07:09.888Z cpu36:1001599386)0x4538da49af00:[0x420017d2494c]HelperQueueFunc@vmkernel#nover+0x875 stack: 0x0, 0x4308734c1f4c, 0x4308734c1ea8, 0xb72f4fd0e5c0, 0x4538da4a1000
2020-10-07T13:07:09.904Z cpu36:1001599386)0x4538da49afd0:[0x42001810f88c]CpuSched_StartWorld@vmkernel#nover+0xf9 stack: 0x0, 0x0, 0x0, 0x420017d0bb9c, 0x0
2020-10-07T13:07:09.954Z cpu36:1001599386)^[[45m^[[33;1mVMware ESXi 7.0.1 [build-16850804 x86_64]^[[0m
#PF Exception 14 in world 1001599386:fdsAIO IP 0x420017c7c1d2 addr 0xe8
PTEs:0x0;

  • In the vmkernel.log you will see entries similar to below

2020-10-07T13:07:07.014Z cpu10:1001393546 opID=51ed2475)J6: CommitOnDiskTxn:3620: 'VMFS6-UNITY': world: 1001393546 done with onDiskTxn: 0x43132528dbb0 for file (<FD c6 r199>) numMemTxns: 1 with status Success.
2020-10-07T13:07:07.016Z cpu10:1001393546 opID=51ed2475)Fil3: FS3_WaitOrCancelExistingScanner:3179: Cancel requested, disabling the scanner. file Data 0x43132566c4b0
2020-10-07T13:07:07.242Z cpu36:1001599386)World: ResetToVMKOnPanic:3185: PRDA 0x420049000000 ss 0x0 ds 0xf50 es 0xf50 fs 0x0 gs 0x0
2020-10-07T13:07:07.242Z cpu36:1001599386)World: ResetToVMKOnPanic:3187: TR 0xf68 GDT 0xfffffffffca01000 (0xffff) IDT 0xfffffffffc408000 (0xffff)
2020-10-07T13:07:07.242Z cpu36:1001599386)World: ResetToVMKOnPanic:3188: CR0 0x8005003f CR3 0x19ae97000 CR4 0x142660
2020-10-07T13:07:09.864Z cpu36:1001599386)Backtrace for current CPU #36, worldID=1001599386, fp=0x430874608c80
2020-10-07T13:07:09.874Z cpu36:1001599386)0x4538da49aeb0:[0x420017c7c1d2]LibAIODoAsyncIO@vmkernel#nover+0x12a stack: 0x0, 0x0, 0x2710, 0x4308745d4be8, 0x4308745d4ba0
2020-10-07T13:07:09.888Z cpu36:1001599386)0x4538da49af00:[0x420017d2494c]HelperQueueFunc@vmkernel#nover+0x875 stack: 0x0, 0x4308734c1f4c, 0x4308734c1ea8, 0xb72f4fd0e5c0, 0x4538da4a1000
2020-10-07T13:07:09.904Z cpu36:1001599386)0x4538da49afd0:[0x42001810f88c]CpuSched_StartWorld@vmkernel#nover+0xf9 stack: 0x0, 0x0, 0x0, 0x420017d0bb9c, 0x0
2020-10-07T13:07:09.904Z cpu36:1001599386)0x4538da49afd0:[0x42001810f88c]CpuSched_StartWorld@vmkernel#nover+0xf9 stack: 0x0, 0x0, 0x0, 0x420017d0bb9c, 0x0
2020-10-07T13:07:09.954Z cpu36:1001599386)^[[45m^[[33;1mVMware ESXi 7.0.1 [build-16850804x86_64]^[[0m
#PF Exception 14 in world 1001599386:fdsAIO IP 0x420017c7c1d2 addr 0xe8
PTEs:0x0


Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment

Cause

This issue occurs due to a race condition (while accessing aioHandle) in ESXi.
Under VDI scale/stress load, ESXi can go to PSOD  state.

Resolution

This is a known issue in ESXi 7.0 U1.
It is resolved in patch ESXi_7.0.1-0.25.17325551. To download go to the Broadcom Support page.

Work around:
To workaround this issue, restart ESXi host and enable pool provisioning.