VMs with NVMe Controllers enabled become unresponsive after upgrading to vSphere 8.0 P05
search cancel

VMs with NVMe Controllers enabled become unresponsive after upgrading to vSphere 8.0 P05

book

Article ID: 416995

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 8.x

Issue/Introduction

 

  • The virtual machines on the hosts become unresponsive to ping, SSH, or RDP requests. They remain in hung state until the host is rebooted. Although they do not disconnect, they stay non-responsive until the ESXi reboot is performed.

  • vmkernel log (/var/run/log/vmkernel.log) may report out of memory errors as below
    YYYY-MM-DDTHH:MM:SS vmkernel: cpu##:#####)VNVME: ###: Error status: Out of memory converted to: 0x0:0x#
    YYYY-MM-DDTHH:MM:SS vmkernel: cpu##:#####)VNVME: ###: Error status: Out of memory converted to: 0x0:0x#

  • vmware.log (/vmfs/volumes/datastore/vmname/) may report below pattern of READ & WRITE failures,

    vvol:#####/rfc#####/vmware-#####.log:YYYY-MM-DDTHH:MM:SS In(##) vcpu-# - NVME-VMK: nvme0:0: WRITE Command failed. Status: 0x0/0x4.
    vvol:#####/rfc#####/vmware-#####.log:YYYY-MM-DDTHH:MM:SS In(##) vcpu-# - NVME-VMK: nvme0:0: READ  Command failed. Status: 0x0/0x4


Environment

vSphere 8.x

Cause

Executing NVMe UNMAP (Dataset Management Command  - Deallocate)  commands on virtual machines with hardware versions earlier than 19 triggers a memory leak. This occurs because virtual machine hardware versions below 19 lack native NVMe emulation support, forcing the hypervisor to process UNMAP commands through a legacy workflow.

Additionally, if the target disk is a Raw Device Mapping (RDM) attached to a NVMe controller or a disk with CBRC/HBR filters, upgrading the virtual machine hardware version to 19 or later does not mitigate the memory leak. NVMe controllers managing RDM disks or disks with CBRC/HBR filters inherently utilize the legacy workflow regardless of hardware version. 

Resolution

This issue is resolved in ESXi 8.0U3I

If upgrading the host is not currently an option and the VM is NOT using a RDM disk or a disk with CBRC/HBR filters on the NVME controller upgrade the VM hardware version to 19 or later. 

If an HW upgrade is not possible due, proceed with disabling unmap for the Virtual Machine.

  1. Login vSphere Client
  2. Power off the VM
  3. Edit the VM settings in the vCenter UI
  4. Go to Advanced Parameters tab and filter for disk.scsiUnmapAllowed
    • If present but set to True, change the value to False


    • If not present add the attribute disk.scsiUnmapAllowed and set value to False




  5. Save the configuration with the OK button

  6. Power on the VM

 

Please note that after upgrading the virtual machine hardware version or disabling unmap the existing memory leak will not be cleared until the ESXi host is rebooted.