ESX operations stuck due to storage heap exhaustion on a system that has VMs using RDMs
vSphere 7.0U3 later.
Virtual Machine using RDM disks.
An optimization to handle unsupported SCSI opcodes and unsupported SCSI inquiry/mode pages was added in vSphere 7.0U3. Here due to a bug in tracking the unsupported list when inquiry/mode sense requests are issued by VM to an RDM Lun, heap at ESX storage layer can get exhausted and you can see the signature as below -
1. A large number of warnings that ModeSense request blocking are continually being logged in the vmkernel.
2024-12-DDTHH:MM:SS.XXXZ cpu58:2098352)WARNING: ScsiDeviceIO: 4067: ModeSense 0x1a request failed - blocking page:0x19 subpage:0x0 naa.60060e800899a600005099a600000116
2. "Percent Free of Max" value of StorageHeap in the esxcfg-info command will be gradually decreased.
If datastore connection problem happened, "Percent Free of Max" value will be 0, and warning that StorageHeap has reached its maximum value will be logged in vmkernel.
2024-12-DDTHH:MM:SS.023Z cpu43:7432806)WARNING: Heap: 3644: Heap storageHeap already at its maximum size. Cannot expand.
# esxcfg-info -a | grep -A11 "storageHeap" | head -n 11
|----Name............................................storageHeap
|----Growable........................................true
|----Max Size........................................704930216 bytes
|----Max Available...................................146816 bytes
|----Current Size....................................704930216 bytes
|----Current Allocation..............................704783400 bytes
|----Current Available...............................146816 bytes
|----Current Releasable..............................32 bytes
|----Percent Free of Current.........................0
|----Percent Free of Max.............................0 <================== Please check this value.
|----Percent Releasable..............................0
The optimization added to handle unsupported SCSI opcodes and unsupported inquiry/mode pages can be disabled by disabling an advanced config option -
In the advanced settings of the esxi host, change the parameter Scsi.SCSIBlockUnsupportedOpcodesAndPages from the default 1 (enabled) to 0 (disabled).
or execute command
# esxcfg-advcfg -s 0 /Scsi/SCSIBlockUnsupportedOpcodesAndPages
Issue will be fixed in future ESXi release.