ESXi host PSOD after upgrading to ESXi 8.0 U3e during migration of VMs using NVMe array
search cancel

ESXi host PSOD after upgrading to ESXi 8.0 U3e during migration of VMs using NVMe array

book

Article ID: 397926

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • An ESXi host fails with a Purple Diagnostic Screen when we migrate the VMs through NVMe Array.
  • We get the issue when we upgrade the ESXi host to 8.0 U3e (24674464)
  • We see below Backtrace:
    cpu4:2106190)Backtrace for current CPU #4, worldID=2106190, fp=0x4539c2a9f000
    cpu4:2106190)0x4539c771bd10:[0x420034a58f24]NVMECreateSCSICmd@vmkernel#nover+0x548 stack: 0x452107350000, 0x20, 0x2ffffffff, 0x420034a65501, 0x0
    cpu4:2106190)0x4539c771bda0:[0x420034a5929a]NVME_ExecuteSCSICmd@vmkernel#nover+0xbf stack: 0xc771f100, 0x430d506554c0, 0x0, 0x2, 0x201da
    cpu4:2106190)0x4539c771be10:[0x420034a53e0f]VNVMEExecuteCommandInt@vmkernel#nover+0x434 stack: 0xd62b62ac5e, 0x420034a65d48, 0x4539c771f300, 0x430d50652780, 0x20221e
    cpu4:2106190)0x4539c771be80:[0x420034a54585]VNVME_VmkExecuteCmd@vmkernel#nover+0x252 stack: 0x0, 0x0, 0x430d50652780, 0xd90001, 0x430d50655440
    cpu4:2106190)0x4539c771bf40:[0x420034a5ab4b]NVMEProcessRequestRing@vmkernel#nover+0x11c stack: 0x452107350000, 0x420034aafe45, 0x0, 0x0, 0x0
    cpu4:2106190)0x4539c771bfb0:[0x420034a69a3d]VSCSIWorldFunc@vmkernel#nover+0x92 stack: 0x4539c771f100, 0x0, 0x0, 0x420034adc88f, 0x0
    cpu4:2106190)0x4539c771bfe0:[0x420034adc88e]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0, 0x420034544fb0, 0x0, 0x0, 0x0 cpu4:2106190)0x4539c771c000:[0x420034544faf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0

Environment

VMware vSphere ESXi 8.0 U3e

Cause

The MAXIMUM UNMAP BLOCK DESCRIPTOR COUNT field defines how many UNMAP block descriptors a device can handle per UNMAP command; a value of 00000000h indicates the device does not support the UNMAP command, which can cause issues with space reclamation.

Resolution

Broadcom is aware about the issue and will be fixed in upcoming version. 

Workaround:

1. First check the config option value

esxcfg-advcfg -g /Scsi/NvmeMaxUnmapBlockDescriptorCount
Value of NvmeMaxUnmapBlockDescriptorCount is 0

2. Update the config option

esxcfg-advcfg -s 255 /Scsi/NvmeMaxUnmapBlockDescriptorCount
Value of NvmeMaxUnmapBlockDescriptorCount is 255

3. Check that the config option is updated

esxcfg-advcfg -g /Scsi/NvmeMaxUnmapBlockDescriptorCount
Value of NvmeMaxUnmapBlockDescriptorCount is 255