/vmfs/volumes/<datastore>/<vm_name>/vmware.log) has entries indicating a memory exhaustion issue, similar to the following example:
####-##-##T##:##:##.###Z In(05) vcpu-0 - Msg_Post: Error
####-##-##T##:##:##.###Z In(05) vcpu-0 - [vob.fssvec.Lookup.file.failed] File system specific implementation of Lookup[file] failed
####-##-##T##:##:##.###Z In(05) vcpu-0 - [vob.fssvec.Lookup.file.failed] File system specific implementation of Lookup[file] failed
####-##-##T##:##:##.###Z In(05) vcpu-0 - [vob.fssvec.Lookup.file.failed] File system specific implementation of Lookup[file] failed
####-##-##T##:##:##.###Z In(05) vcpu-0 - [msg.literal] Cannot allocate memory
####-##-##T##:##:##.###Z In(05) vcpu-0 - [msg.disk.noBackEnd] Cannot open the disk '<snapshot>.vmdk' or one of the snapshot disks it depends on.
####-##-##T##:##:##.###Z In(05) vcpu-0 - [msg.checkpoint.continuesync.error] An operation required the virtual machine to quiesce and the virtual machine was unable to continue running.
####-##-##T##:##:##.###Z In(05) vcpu-0 - ----------------------------------------
####-##-##T##:##:##.###Z In(05) vcpu-0 - MsgIsAnswered: Using builtin default 'OK' as the answer for 'msg.checkpoint.continuesync.error'
####-##-##T##:##:##.###Z In(05) vcpu-0 - SnapshotVMX_ConsolidateCancel: Requesting snapshot consolidate cancel.
####-##-##T##:##:##.###Z In(05) vcpu-0 - Msg_Post: Error
####-##-##T##:##:##.###Z In(05) vcpu-0 - [msg.poweroff.commitOn] Performing disk cleanup. Cannot power off.
####-##-##T##:##:##.###Z In(05) vcpu-0 - ----------------------------------------
/var/run/log/vmkernel.log reports that the scsiCmdSlab ran out of memory:
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu43:2447718)WARNING: scsiCmdSlab out of memory
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu49:2447717)WARNING: scsiCmdSlab out of memory
####-##-##T##:##:##.###Z In(182) vmkernel: cpu43:2447718)ScsiFds: 767: Allocate command from childToken failed:Out of memory resID:2447718, originSN:0, originHandle:0x0
####-##-##T##:##:##.###Z In(182) vmkernel: cpu49:2447717)ScsiFds: 767: Allocate command from childToken failed:Out of memory resID:2447717, originSN:0, originHandle:0x0
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu49:2447717)WARNING: ScsiDeviceIO: 233: Out of Memory... Trying from emergency heap
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu39:2447721)WARNING: ScsiDeviceIO: 233: Out of Memory... Trying from emergency heap
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu49:2447717)WARNING: ScsiDeviceIO: 6519: Failed to allocate memory for I/O to device naa.###
VMware vSphere ESXi 8.0.x
During snapshot consolidation or -removal ESXi will send automatic unmap commands to the datastore. Per default these commands are being sent at a rate of 100 MB/s.
If the datastore connection cannot keep up with these commands, they will queue up in the scsiCmdSlab buffer, leading to an increase in the amount of memory the buffer allocates.
However, scsiCmdSlab has a limit on how much memory it can use, thus if there are too many queued commands, the heap will report memory exhaustion, leading to similar issues as described above.
In order to prevent this issue from occurring, you can reduce the amount of unmap commands sent to the datastore by changing the automatic space reclamation rate to 10 MB/s.
To do this, please follow the steps outlined in How to throttle the unmap requests on Datastore ( Space Reclamation ).
For information on how to monitor the automatic unmap I/O, please refer to Monitor automatic unmap I/O issued by ESXi.