Symptoms:
Symptom - 1:-
- "DIF Error" in vmkernel logs
2020-04-06T08:25:12.154Z cpu74:2098447)qlnativefc: vmhba2(af:0.0): iocb(s) 0x430a045f7340 Returned STATUS.
2020-04-06T08:25:12.154Z cpu74:2098447)qlnativefc: vmhba2(af:0.0): DIF ERROR in cmd: 0x28 Type=0x0 lba=0xb100 actRefTag=0x1000000, expRefTag=0xb100, actAppTag=0x0, expAppTag=0x0, actGuard=0x400, expGuard=0xa671.
2020-04-06T08:25:12.154Z cpu74:2098447)ScsiDeviceIO: 3449: Cmd(0x45a74c43f980) 0x28, CmdSN 0x70af4 from world 2100597 to dev "naa.60060e80165251000001973100000105" failed H:0xf D:0x0 P:0x0 Invalid sense data: 0x1a 0x1b 0x45.
2020-04-06T08:25:12.155Z cpu0:2107305)qlnativefc: vmhba2(af:0.0): iocb(s) 0x430a04577780 Returned STATUS.
2020-04-06T08:25:12.155Z cpu0:2107305)qlnativefc: vmhba2(af:0.0): DIF ERROR in cmd: 0x28 Type=0x0 lba=0xb100 actRefTag=0x1000000, expRefTag=0xb100, actAppTag=0x0, expAppTag=0x0, actGuard=0x400, expGuard=0xa671.
2020-04-06T08:25:12.155Z cpu2:2098444)ScsiDeviceIO: 3449: Cmd(0x459b62387340) 0x28, CmdSN 0x70af5 from world 2100597 to dev "naa.60060e80165251000001973100000105" failed H:0xf D:0x0 P:0x0 Invalid sense data: 0x4f 0x2 0x43.
Additional symptoms might also include:
- Disk I/O operation failures reported by VMs
- Filesystem in Linux Guest OS become read-only due to underlying disk I/O latency or failure
- Unresponsive or Sluggish ESXi host
- The high number of H:0xf SCSI error codes in vmkernel logs
A large number of issues observed with qlnativefc driver version 3.1.29.0 and qlnativefc driver version 3.1.31.0, but not limited to these driver versions.
Symptom - 2:-
- "Data Integrity Field (DIF) Error" in VMkernel logs appears only if the Debugging is enabled in the qlnativefc Driver
2020-07-31T10:01:02.130Z cpu0:66211)qlnativefc: vmhba1(37:0.0): Data phase error, rediscover DIF capability : senseKey = 0x5 : asc = 0x4b : ascq = 0x82
2020-07-31T10:01:02.143Z cpu17:480587)qlnativefc: vmhba1(37:0.0): Data phase error, rediscover DIF capability : senseKey = 0x5 : asc = 0x4b : ascq = 0x82
2020-07-31T10:01:02.144Z cpu0:2122267)qlnativefc: vmhba1(37:0.0): Data phase error, rediscover DIF capability : senseKey = 0x5 : asc = 0x4b : ascq = 0x82
- Disk I/O operation failures reported by VMs
- Unresponsive Virtual Machines
- Unresponsive or Sluggish ESXi host
- Hostd reporting unresponsive
- The VMkernel logs shows too many "State in doubt, requested fast path state update" messages
2020-07-26T22:40:56.347Z cpu17:66374)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.xxx.ID" state in doubt; requested fast path state update...
2020-07-26T22:40:56.546Z cpu17:66374)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.xxx.ID" state in doubt; requested fast path state update...
2020-07-26T22:41:56.346Z cpu0:66374)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.xxx.ID" state in doubt; requested fast path state update...
- I/O Error reported to the LUNs very specific to the ones which have DIF disabled from the Storage Array
2020-07-29T06:15:00.439Z cpu23:65624)ScsiDeviceIO: 3015: Cmd(0x439a86176840) 0x28, CmdSN 0x49855f from world 0 to dev "naa.xxx-ID" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2020-07-29T06:15:00.439Z cpu15:2952583)Partition: 427: Failed read for "naa.xxx-ID": I/O error
2020-07-29T06:15:00.439Z cpu15:2952583)Partition: 1007: Failed to read protective mbr on "naa.xxx-ID" : I/O error
- This was mostly observed with a combination of 3Pardata HP Arrays which have DIF disabled LUNs and qlnativefc driver with DIF enabled by default within the driver
- HBA Adapter Failed (Read, Blocks Read and Written) are way to high in such cases, refer the HBA stats for the info
vmhba1:
Successful Commands: 728172193
Blocks Read: 44452702311
Blocks Written: 13163331407
Read Operations: 325720036
Write Operations: 355132847
Reserve Operations: 6142
Reservation Conflicts: 431652
Failed Commands: 48551498
Failed Blocks Read: 403288824638317
Failed Blocks Written: 311812910615
Failed Read Operations: 48078480
Failed Write Operations: 50916
- The issue was observed in the below versions of drivers
vSphere 6.5 :- 2.1.96.0
vSphere 6.7 :- 3.1.31.0
The issue was also observed in vSphere 7.0
- The issue was also mostly observed with QLogic QLE2690 Single Port 16Gb Fibre Channel Adapters and the same vendor QLogic ISP2532-based 8Gb Fibre Adapters had no issues reported