CPU lock-ups or crashes when using large VMFS6 datastore in VMware ESXi 7.0
search cancel

CPU lock-ups or crashes when using large VMFS6 datastore in VMware ESXi 7.0

book

Article ID: 317906

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
In the vmkernel logs, you see error message similar to:

 YYYY-MM-DDT23:39:13.460Z cpu30:2097385)WARNING: Heartbeat: 849: PCPU 2 didn't have a heartbeat for 8 seconds; *may* be locked up.
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)ALERT: NMI: 694: NMI IPI: RIPOFF(base):RBP:CS [0x107c73d(0x42001b400000):0x3658:0xf48] (Src 0x1, CPU2)
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b320:[0x42001c47c73c]Res6AffMgrComputeSortIndices@esx#nover+0x359 stack: 0x430eedde6670
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b3b0:[0x42001c483e46]Res6AffMgrGetCluster@esx#nover+0xb1f stack: 0xd
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b4b0:[0x42001c485491]Res6AffMgr_AllocResourcesInt@esx#nover+0x40a stack: 0x430eef635d60
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b680:[0x42001c4860e2]Res6AffMgr_AllocResources@esx#nover+0x1b stack: 0x0
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b6c0:[0x42001c4387cc]Fil3_AllocateBlocksVMFS6@esx#nover+0x2ad stack: 0x0
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b780:[0x42001c43f267]Fil3PlugFileHoleIntVMFS6@esx#nover+0x4ac stack: 0x430ef013ea00
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b8b0:[0x42001c43f9a1]Fil3_PlugFileHoleTxnVMFS6@esx#nover+0x5a stack: 0x430eeddf1000
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9b920:[0x42001c403d6a]Fil3_FileIOInt@esx#nover+0x1adb stack: 0x4538866a1980
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9bca0:[0x42001c406dde]Fil3_FileIOIntWithRetry@esx#nover+0xe7 stack: 0x0
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9bd40:[0x42001c4070c2]Fil3_FileIOLegacy@esx#nover+0x137 stack: 0x34d6b90787726
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9be40:[0x42001b4397ab]FSSVec_FileIO@vmkernel#nover+0x20 stack: 0x43079572eb90
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9be60:[0x42001b4370c0]FSSFileIO@vmkernel#nover+0x1dd stack: 0x0
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9bec0:[0x42001b437231]FSS_AsyncFileIO@vmkernel#nover+0xe stack: 0x0
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9bee0:[0x42001b4509b3]LibAIODoAsyncIO@vmkernel#nover+0x74 stack: 0xffff8002e7932300
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9bf20:[0x42001b4d1e91]HelperQueueFunc@vmkernel#nover+0x29e stack: 0x430795681308
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9bfe0:[0x42001b769481]CpuSched_StartWorld@vmkernel#nover+0x82 stack: 0x0
 YYYY-MM-DDT23:39:13.460Z cpu2:2189304)0x453895d9c000:[0x42001b4be69f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0

Environment

VMware vSphere ESXi 7.0.x

Cause

This issues is caused in the resource allocation path for large-thin vmdks which consumes high CPU utilization without making progress in resource allocation.

Resolution

This is a known issue.

Resolution for this issue is provided in ESXi 7.0 U2C release. Please see release notes: 

https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u2c-release-notes.html#:~:text=PR%202737934%3A%C2%A0If%20you%20use%20very%20large%20VMFS6%20datastores