2024-07-09T04:05:21.283Z cpu39:2916506)NFS41: NFS41SetSchedQueuePolicy:3056: Mismatch! sched worldID:2916341 worldID:2916341. schedWorld:0x4317b085a880 schedpolicy:0x4538e659bdf02024-07-09T04:05:21.355Z cpu39:2916506)World: 3072: PRDA 0x420049c00000 ss 0x0 ds 0x10b es 0x10b fs 0x10b gs 0x02024-07-09T04:05:21.355Z cpu39:2916506)World: 3074: TR 0xf58 GDT 0x45384004e000 (0xf77) IDT 0x420018950000 (0xfff)2024-07-09T04:05:21.355Z cpu39:2916506)World: 3075: CR0 0x80010031 CR3 0x20ea3db000 CR4 0x1427682024-07-09T04:05:21.392Z cpu39:2916506)Backtrace for current CPU #39, worldID=2916506, fp=0x4317b085a8802024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bc88:[0x420018905530]MCSLockWork@vmkernel#nover+0x8 stack: 0xa4435f7, 0x431a22e01240, 0x420019ef6f95, 0x4308fac10450, 0x420019ee9e5c2024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bc90:[0x420019efaa9f]NFSSched_DestroySchedQueue@(nfsclient)#<None>+0x1c stack: 0x431a22e01240, 0x420019ef6f95, 0x4308fac10450, 0x420019ee9e5c, 0x430e4ca016602024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bcb0:[0x420019ef6f94]NFSVolume_DestroySchedQHandle@(nfsclient)#<None>+0x11 stack: 0x430e4ca01660, 0x0, 0x1, 0x420018ceaccd, 0x430e4ca01aa02024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bcc0:[0x420019ee9e5b]NFSOpCloseFile@(nfsclient)#<None>+0xe0 stack: 0x1, 0x420018ceaccd, 0x430e4ca01aa0, 0x0, 0x12024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bd90:[0x42001883b564]FSSVec_CloseFile@vmkernel#nover+0x1d stack: 0x4308fac10544, 0x420018840990, 0x149c05610, 0x420000000000, 0x420049c05ab02024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bda0:[0x4200188373dd]FSS_DoCloseFile@vmkernel#nover+0x6e stack: 0x149c05610, 0x420000000000, 0x420049c05ab0, 0x0, 0xa4435f72024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bdb0:[0x42001884098f]BC_CloseFile@vmkernel#nover+0x70 stack: 0x420049c05ab0, 0x0, 0xa4435f7, 0x4308fac10450, 0x02024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659be00:[0x4200188376ca]FSS_CloseFile@vmkernel#nover+0x87 stack: 0x10, 0x4308f3dd9dd0, 0x4308fac10450, 0x45d9022da868, 0x45d9025b37b02024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659be50:[0x420018cd8dc4]UserVmfs_Close@vmkernel#nover+0x35 stack: 0xa, 0x45d9025b37b0, 0xa, 0x420018cb936c, 0x430f770020102024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659be80:[0x420018cb936b]UserObj_ReleaseWithoutCartel@vmkernel#nover+0x10 stack: 0xa, 0x420018cbb754, 0x176, 0x430f7701a7f0, 0x02024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bea0:[0x420018cbb753]UserObj_FDClose@vmkernel#nover+0x178 stack: 0x0, 0x45d9025b37b0, 0x430f77002010, 0x4538e659bf40, 0x32024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bf00:[0x420018d07e86]LinuxFileDesc_Close@vmkernel#nover+0x1b stack: 0xceb99b1c0, 0x4538e659bfd0, 0x0, 0x0, 0x02024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bf10:[0x420018cb4863]User_LinuxSyscallHandler@vmkernel#nover+0x1a4 stack: 0x0, 0x0, 0x0, 0x42001894e068, 0x10b2024-07-09T04:05:21.392Z cpu39:2916506)0x4538e659bf40:[0x42001894e067]gate_entry@vmkernel#nover+0x68 stack: 0x0, 0x3, 0xce6abdfcd, 0xca5030310, 0xca504e7082024-07-09T04:05:21.416Z cpu39:2916506)ESC[45mESC[33;1mVMware ESXi 7.0.3 [Releasebuild-22348816 x86_64]ESC[0m#PF Exception 14 in world 2916506:vmx-vcpu-0:v IP ######### addr ######
#0 Atomic_Read16 (var=0x9e) at bora/public/vm_atomic.h:2792#1 MCSTryLockCommon (lock=0x9c) at bora/vmkernel/main/mcslock.c:1160#2 MCSLockCommonInt (ra=0x0, lock=0x9c) at bora/vmkernel/main/mcslock.c:2223#3 MCSLockWork (lock=lock@entry=0x9c) at bora/vmkernel/main/mcslock.c:2305#4 0x0000420019efaaa0 in MCS_Lock (lock=0x9c) at bora/vmkernel/private/mcslock.h:261#5 NFSSched_DestroySchedQueue (schedQ=0x4317b085a880) at bora/modules/vmkernel/nfsclient/nfsSched.c:2549#6 0x0000420019ef6f95 in NFSVolume_DestroySchedQHandle (mpe=mpe@entry=0x431a22e01240, fhID=fhID@entry=172242423, schedQHandle=<optimized out>) at bora/modules/vmkernel/nfsclient/nfsVolume.c:4469#7 0x0000420019ee9e5c in NFSOpCloseFile (file=0x4308fac10450, fhID=172242423) at bora/modules/vmkernel/nfsclient/nfsClient.c:4500#8 0x000042001883b565 in FSSVec_CloseFile (desc=<optimized out>, fhID=<optimized out>) at bora/vmkernel/filesystems/fsSwitchVec.c:459#9 0x00004200188373de in FSS_DoCloseFile (fileDesc=fileDesc@entry=0x4308fac10450, fhid=fhid@entry=172242423, openFlags=<optimized out>, openFlags@entry=1, failedOpen=failedOpen@entry=0 '\000') at bora/vmkernel/filesystems/fsSwitch.c:4052#10 0x0000420018840990 in BC_CloseFile (desc=0x4308fac10450, fhid=172242423, openFlags=1, failedOpen=<optimized out>) at bora/vmkernel/filesystems/caches/bufferCache2.c:3194#11 0x00004200188376cb in FSSCloseFile (failedOpen=0 '\000', openFlags=<optimized out>, fhid=172242423, fileDesc=0x4308fac10450) at bora/vmkernel/filesystems/fsSwitch.c:4258#12 FSS_CloseFile (fileHandleID=172242423) at bora/vmkernel/filesystems/fsSwitch.c:4259#13 0x0000420018cd8dc5 in UserVmfs_Close (obj=0x45d9025b37b0) at bora/vmkernel/user/userVmfs.c:2091#14 0x0000420018cb936c in UserObj_ReleaseWithoutCartel (obj=0x45d9025b37b0) at bora/vmkernel/user/userObj.c:2141#15 0x0000420018cbb754 in UserObj_ReleaseWithoutCartel (obj=<optimized out>) at bora/vmkernel/user/userObj.c:4591#16 UserObj_Release (obj=<optimized out>, uci=0x430f77002010) at bora/vmkernel/user/userObj.c:2115#17 UserObj_FDClose (uci=0x430f77002010, fd=<optimized out>) at bora/vmkernel/user/userObj.c:4606#18 0x0000420018d07e87 in LinuxFileDesc_Close (fd=<optimized out>) at bora/vmkernel/user/linuxFileDesc.c:1081#19 0x0000420018cb4864 in User_LinuxSyscallHandler (fullFrame=0x4538e659bf40) at bora/vmkernel/user/user.c:2057#20 0x000042001894e068 in gate_entry ()#21 0x0000000ce6abdfcd in ?? ()
ESXi 7.0.x
VMware by Broadcom is aware of this issue and is working on a fix.
SSH to the ESXI host and run this command:
# esxcli system module parameters set -m nfs41client -p fileBasedScheduler=0
Then run this command to check the nfs41's file based scheduler is disabled
# esxcli system module parameters list -m nfs41client
------------------ ---- ----- -----------fileBasedScheduler bool 0 Enable/Disable file based scheduler for NFSv41 (default: 1)
Then reboot the ESXi host.
Perform this action on all the host with NFS 4.1 mounted datastores.
What will be the impact of disabling File based scheduler?
With FBS, one can set policies (IOPS throughput etc) per vmdk of a VM.
Without FBS, policy are per VM, per datastore.
For example: VM with vmdk1 and vmdk2 on NFS1
vmdk3 and vmdk4 on NFS2
Assume policy of 100iops are set per vmdk.
With FBS: NFS allows only 100 IOPS for each of the above vmdks.(per VM per vmdk)
Without FBS: vmdk1 and vmkdk2 get cumulative of 200 IOPS. It can happen vmdk1 gets 150 IOPS while vmdk2 get 50 IOPS.
Similarly on vmdk3 and vmdk4 also.
If you do not have such policies on vmdks then these changes have no impact.