I/Os might fail on some paths of device or datastore creation may fail if there is a flaky path
search cancel

I/Os might fail on some paths of device or datastore creation may fail if there is a flaky path

book

Article ID: 345238

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To resolve the I/O error.

Symptoms:
  • I/Os might fail on some paths of a device due to faulty switch error
  • Datastore creation on a device having one flaky path fails with  below error  
 The "Create VMFS datastore" operation failed for the entity with the following error message.
An error occurred during host configuration.
Operation failed, diagnostics report:  Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS  on device 
The vmkernel.log will report messages as below
2018-08-27T06:32:09.964Z cpu40:1001390104)NMP: nmp_ThrottleLogForDevice:3781: H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL. cmdId.initiator=0x43064aecc6c0 CmdSN 0x9a
2018-08-27T06:32:09.964Z cpu40:1001390104)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600601603b303c009271dacce40de811" state in doubt; requested fast path state update...
2018-08-27T06:32:09.964Z cpu40:1001390104)ScsiDeviceIO: SCSICompleteDeviceCommand:3294: Cmd(0x45a2ddcbd080) 0x2a, CmdSN 0x9a from world 1001393007 to dev "naa.600601603b303c009271dacce40de811" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2018-08-27T06:32:09.976Z cpu40:1001390104)vmw_psp_rr: psp_rrCommandComplete:1972: Setting path vmhba4:C0:T1:L15 as flaky, cmd=0x45a2ddcbd080
2018-08-27T06:32:09.976Z cpu40:1001390104)vmw_psp_rr: psp_rrCommandComplete:1990: Setting eval time for path vmhba4:C0:T1:L15 as 10sec, cmd=0x45a2ddcbd080
2018-08-27T06:32:09.989Z cpu40:1001393041 opID=313e2b0b)ScsiHandle: SCSIOpenNamedDevice:752: handle=0x0x43064aecc6c0 (naa.600601603b303c009271dacce40de811 part 1) is read-only
2018-08-27T06:32:09.989Z cpu40:1001393041 opID=313e2b0b)LVM: ProbeDeviceInt:9585: Failed to detect if device <naa.600601603b303c009271dacce40de811:1> is a snapshot: Device does not contain a logical volume
2018-08-27T06:32:09.989Z cpu40:1001393041 opID=313e2b0b)LVM: InitDevice:10462: LVMProbeDevice failed on (3444138432, naa.600601603b303c009271dacce40de811:1): Device does not contain a logical volume
2018-08-27T06:32:09.989Z cpu40:1001393041 opID=313e2b0b)FSS: Create:2311: Failed to format LVM/VMFS on device 'naa.600601603b303c009271dacce40de811:1': Storage initiator error
2018-08-27T06:32:09.989Z cpu40:1001393041 opID=313e2b0b)FSS: Create:2373: Failed to create FS on dev [naa.600601603b303c009271dacce40de811:1]
  • The guest OS in VMs re-mount their filesystems in read-only mode or become unresponsive.
 In this scenario the vmkernel.log will report FCPIO_DATA_CNT_MISMATCH .

Vmkernel.log will report message to below
2017-05-28T23:47:40.089Z cpu27:32809)<3>fnic : 2 :: hdr status = FCPIO_DATA_CNT_MISMATCH
2017-05-28T23:47:40.089Z cpu19:35295)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x28 (0x43a600ef10c0, 33236) to dev "naa.60060e80101e1970058be1670000002a" on path "vmhba2:C0:T1:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL


Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.7

Cause

If the device is claimed by PSP_RR default policy, path switching happens every 1000 IOs. Hence in case the current used path is marked as flaky, it will not switch to another working path until 1000 IOs succeed on the path.

Resolution

This issue is resolved in 6.7 U1 release . Refer to release notes 

Workaround:
To workaround this  issue use iops=1
Fore more information refer to VMware KB : Adjusting Round Robin IOPS limit from default 1000 to 1

Additional Information

Impact/Risks:
None