NVMe OverTCP intermittent connectivity loss to the storage devices or PSODs.
search cancel

NVMe OverTCP intermittent connectivity loss to the storage devices or PSODs.

book

Article ID: 323036

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi hosts configured with NVMe over TCP may experience I/O failure intermittently when data digest is enabled. 
  • ESXi hosts may even PSOD if the I/O failures are persistent. 
  • The I/O failures are more prominent during VM snapshot tasks, like snapshot consolidation during backups. 
  • Error similar to the following is seen in vmkernel.log 
YYYY-MM-DDTHH:MM:SS.MMMZ cpu###:#######)nvmetcp:nt_ReceiveC2HData:1225 [ctlr ###, queue #] Failed to receive data digest: Failure



Environment

VMware vSphere ESXi 7.0.x

Cause

Digest is a NVMe feature for error checking only for NVMeOverTCP. It can be enabled for data and header, but is disabled by default. If the data digest field exceeds a single memory buffer, the nvmetcp driver might not be able to read it. As a result, the driver might drop the NVMe over TCP connection and trigger a queue reset. If this situation repeats, I/O errors start to occur, which might cause issues such as NVMe over TCP storage becoming inaccessible or applications failing, or in the worst case, the ESXi host might fail with a purple diagnostic screen

Resolution

This issue is resolved in VMware ESXi 7.0 Update 3q and later releases. Please refer vsphere-esxi-70u3q-release-notes.

If upgrade is not an option, please workaround this issue by disabling data digest by executing below command in the esxi host. 

esxcli nvme fabrics connect --digest 0

To verify if digest is enabled/disabled from the command line, execute the following command. 

esxcli nvme fabrics connection list

The output will look similar to below. 

You may see one of the below value depending on what feature is enabled during the initial setup of NVMe Over TCP. Note: The digest is disabled by default.


-d|--digest=<str>     Enable/disable digest verification (only for NVMe over TCP). Available options are:
                            0: Disable digest verification (default)
                            1: Enable header digest verification
                            2: Enable data digest verification
                            3: Enable header and data digest verification