Hosts cannot read/write to their Virtual Volume (vVols) datastores - READ command failed
search cancel

Hosts cannot read/write to their Virtual Volume (vVols) datastores - READ command failed

book

Article ID: 318011

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:


1. A subset of ESXi hosts in a cluster cannot read/write to their presented Virtual Volume (vVols) datastores. 

2. Protocol endpoints are still accessible from the host 

3. Attempts to create a new test virtual machine fails 

4. Rebooting the host does not help 

5. You are unable to browse config vVols 

6. A network interruption in the environment can lead to this problem  

7. Host displays 'No such device or address' upon listing vVols from vVol directory  

[root@esx-02:/vmfs/volumes/vvol:6000d3100581ac02-0000000000000006] ls
ls: ./naa.6000d3100581ac000000000000002547: No such device or address
ls: ./jump-2: No such device or address
ls: ./naa.6000d3100581ac0000000000000031d0: No such device or address
ls: ./mc-ctf-kerbergross-dc-template-bak: No such device or address
ls: ./naa.6000d3100581ac000000000000000926: No such device or address
ls: ./all-your-ports-are-belong-to-us-grimm: No such device or address
ls: ./naa.6000d3100581ac0000000000000021af: No such device or address
ls: ./GRIMM-meerkat: No such device or address
ls: ./naa.6000d3100581ac0000000000000025c4: No such device or address
ls: ./Jenkins-Kubernetes: No such device or address
ls: ./naa.6000d3100581ac00000000000000258c: No such device or address
ls: ./jump-22.data-strategy.range: No such device or address
ls: ./naa.6000d3100581ac000000000000001aec: No such device or address
ls: ./media-uploader: No such device or address

Log excerpts from a host facing this problem : 

VMkernel.log -

2022-08-29T19:56:53.628Z cpu25:2168990)osfs: OSFS_VmklinkLookup:508: Failure (p [vvol], c 6000d3100581ac02-0000000000000006)
2022-08-29T19:56:53.628Z cpu25:2168990)osfs: OSFS_Lookup:2779: Lookup error: file = naa.6000d3100581ac0000000000000021d0, status = No filesystem on the device
2022-08-29T19:56:53.654Z cpu40:2168991)VVol: ScsiExtentRefreshCapacity:959: Refresh capacity for VVOL(0xe2000026d56f0000): 8388608 -> 8388608
2022-08-29T19:56:53.934Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac0000000000000021d0] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:53.934Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac0000000000000021d0] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:53.934Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac0000000000000021d0] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:53.934Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac0000000000000021d0] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:53.934Z cpu40:2168991)osfs: OSFS_MountChild:4216: Failed to probe OSFS child for device name 'naa.6000d3100581ac0000000000000021d0': No filesystem on the device
2022-08-29T19:56:53.934Z cpu25:2168990)osfs: DebugDumpVmklinkResponse:814: {ID: 17d; type:LOOKUP; pid:[vvol]; cid:6000d3100581ac02-0000000000000006; aid:6000d3100581ac02-0000000000000006; status:No filesystem on the device; bufLen:0
2022-08-29T19:56:53.934Z cpu25:2168990)osfs: OSFS_VmklinkLookup:508: Failure (p [  vvol], c 6000d3100581ac02-0000000000000006)
2022-08-29T19:56:53.934Z cpu25:2168990)osfs: OSFS_Lookup:2779: Lookup error: file = docs.cyber.range, status = No filesystem on the device
2022-08-29T19:56:53.935Z cpu40:2168991)VVol: VVolAllocAndInitDevice:8133: Creating a device for naa.6000d3100581ac000000000000000876 (Type VVOL_OBJTYPE_CONFIG)
2022-08-29T19:56:56.055Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000000876] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:56.056Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000000876] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:56.056Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000000876] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:56.056Z cpu34:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000000876] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-29T19:56:56.056Z cpu40:2168991)osfs: OSFS_MountChild:4216: Failed to probe OSFS child for device name 'naa.6000d3100581ac000000000000000876': No filesystem on the device
2022-08-29T19:56:56.056Z cpu25:2168990)osfs: DebugDumpVmklinkResponse:814: {ID: 17e; type:LOOKUP; pid:[vvol]; cid:6000d3100581ac02-0000000000000006; aid:6000d3100581ac02-0000000000000006; status:No filesystem on the device; bufLen:0

vvold.log -

2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Libs] ObjectStoreFileSystemImpl::UnmountNamespace: BEGIN (opIDS = )
2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: Sending IPC request with opcode 3
2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: Receiving IPC response
2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: IPC finished for opID . OpCode: 195887107 Response: Not found
2022-08-29T19:59:52.936Z error vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: IPC failed for opID . OpCode: 195887107 Error: IPC returned error : Not found
2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: END (opIDS = ) IPC returned error : Not found
2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Libs] ObjectStoreFileSystemImpl::UnmountNamespace: END (opIDS = ) UmountNamespace IPC failed for name () uuid (naa.6000d3100581ac00000000000000258c) vimOpID ()
2022-08-29T19:59:52.936Z error vvold[2100395] [Originator@6876 sub=Default] VVolDevice::UnmountDevice, failed to unmount - could not find device (6000d310-0581-ac02-0000-000000000006/naa.6000d3100581ac00000000000000258c)
2022-08-29T19:59:52.936Z info vvold[2100395] [Originator@6876 sub=Default] VVolDevice:PrepareDeviceForBatchUnbind Preparing device for unbind naa.6000d3100581ac000000000000000891
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: fstype vvol
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid
vvol:6000d3100581ac02-0000000000000006
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-000000000000000f
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-0000000000000011
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-0000000000000010
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] 2100395:VVOLLIB : VVolLib_UserLockVVol:10215: Successfully acquired user lock on VVol naa.6000d3100581ac000000000000000891 (curTime:5552 startTime:5552 timeoutSec:5

2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] ObjectStoreFileSystemImpl::UnmountNamespace: BEGIN (opIDS = )
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: Sending IPC request with opcode 3
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: Receiving IPC response
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: IPC finished for opID . OpCode: 195887107 Response: Not found
2022-08-29T19:59:52.937Z error vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: IPC failed for opID . OpCode: 195887107 Error: IPC returned error : Not found
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] OSFSIpcCall: END (opIDS = ) IPC returned error : Not found
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] ObjectStoreFileSystemImpl::UnmountNamespace: END (opIDS = ) UmountNamespace IPC failed for name () uuid (naa.6000d3100581ac000000000000000891) vimOpID ()
2022-08-29T19:59:52.937Z error vvold[2100395] [Originator@6876 sub=Default] VVolDevice::UnmountDevice, failed to unmount - could not find device (6000d310-0581-ac02-0000-000000000006/naa.6000d3100581ac000000000000000891)
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Default] VVolDevice:PrepareDeviceForBatchUnbind Preparing device for unbind naa.6000d3100581ac000000000000002436
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: fstype vvol
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-0000000000000006
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-000000000000000f
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-0000000000000011
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] GetTypedFileSystems: uuid vvol:6000d3100581ac02-0000000000000010
2022-08-29T19:59:52.937Z info vvold[2100395] [Originator@6876 sub=Libs] 2100395:VVOLLIB : VVolLib_UserLockVVol:10215: Successfully acquired user lock on VVol naa.6000d3100581ac000000000000002436 (curTime:5552 startTime:5552 timeoutSec:5

2022-08-28T22:46:40.622Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000002cb9] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:40.623Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000001fb1] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:40.623Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000002085] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:40.623Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000000ad2] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:40.624Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac0000000000000029e4] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:40.624Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000002a21] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:40.625Z cpu3:2097298)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000002ca8] READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)
2022-08-28T22:46:53.086Z cpu33:2097338)VVol: 2421: [vvolDev: naa.6000d3100581ac000000000000001450]READ command (ioFlags 0x200128, cmd 0) failed: Not supported (0xbad0020)

2022-08-29T18:21:07.269Z cpu66:2129727)VVol: 2298: [vvolDev: naa.6000d3100581ac000000000000002085] Failed to issue command 0x26 (Address is not aligned on required boundary)
2022-08-29T18:21:07.286Z cpu66:2129727)VVol: 2298: [vvolDev: naa.6000d3100581ac000000000000000ad2] Failed to issue command 0x26 (Address is not aligned on required boundary)
2022-08-29T18:21:07.304Z cpu66:2129727)VVol: 2298: [vvolDev: naa.6000d3100581ac0000000000000029e4] Failed to issue command 0x26 (Address is not aligned on required boundary)
2022-08-29T18:21:07.322Z cpu66:2129727)VVol: 2298: [vvolDev: naa.6000d3100581ac000000000000002a21] Failed to issue command 0x26 (Address is not aligned on required boundary)
2022-08-29T18:21:07.341Z cpu66:2129727)VVol: 2298: [vvolDev: naa.6000d3100581ac000000000000002ca8] Failed to issue command 0x26 (Address is not aligned on required boundary)

2022-08-29T18:35:19.049Z cpu92:15182124)osfs: OSFS_MountChild:4216: Failed to probe OSFS child for device name 'naa.6000d3100581ac000000000000001daa': No filesystem on the device
2022-08-29T18:35:19.164Z cpu49:15182103)osfs: OSFS_MountChild:4216: Failed to probe OSFS child for device name 'naa.6000d3100581ac000000000000002e58': No filesystem on the device
2022-08-29T18:35:19.308Z cpu92:15182124)osfs: OSFS_MountChild:4216: Failed to probe OSFS child for device name 'naa.6000d3100581ac000000000000001daa': No filesystem on the device
2022-08-29T18:35:19.615Z cpu49:15182103)osfs: OSFS_MountChild:4216: Failed to probe OSFS child for device name 'naa.6000d3100581ac000000000000002e58': No filesystem on the device

2022-08-29T18:11:23.677Z warning vvold[2100375] [Originator@6876 sub=Default] VasaSession::ConvertSoapError: [SESSION HEALTH] Connection-local error with VP SC5020-360876 [Connected -> TransportError], session appears healthy (last successful command 30164 msecs ago)
2022-08-29T18:11:23.678Z error vvold[2100375] [Originator@6876 sub=Default] VasaOp[#30] ===> FINAL FAILURE getEvents, error (TRANSPORT_FAULT / UNKNOWN FAULT) VP (SC5020-360876) Container (SC5020-360876) timeElapsed=1 msecs (#outstanding 0)
2022-08-29T18:11:23.678Z error vvold[2100375] [Originator@6876 sub=Default] VasaSession::EventPollerCB VP SC5020-360876: getEvents failed (TRANSPORT_FAULT, UNKNOWN FAULT) [session state: Connected]
2022-08-29T18:12:15.300Z error vvold[2100377] [Originator@6876 sub=Default] VasaOp[#31] ===> FINAL FAILURE getEvents, error (INVALID_SESSION / Vasa not initialized. / ) VP (SC5020-360876) Container (SC5020-360876) timeElapsed=21717 msecs (#outstanding 0)
2022-08-29T18:12:15.301Z error vvold[2100377] [Originator@6876 sub=Default] VasaSession::EventPollerCB VP SC5020-360876: getEvents failed (INVALID_SESSION, Vasa not initialized. / ) [session state: OutOfSync]
2022-08-29T18:12:15.583Z error vvold[2100151] [Originator@6876 sub=Default] VasaSession::DoSetContext: setContext for VP SC5020-360876 (url: https://172.21.11.201:3034/vasa-provider/vasa2/vasaService) failed [connectionState: OutOfSync]: INVALID_SESSION (Vasa not initialized. / )
2022-08-29T18:12:15.583Z info vvold[2100151] [Originator@6876 sub=Default] VasaOp::EventPollerCB [#32]: ===> Issuing 'getEvents' to VP [SC5020-360876:TransportError (Outstanding 0/8)]
2022-08-29T18:12:15.583Z error vvold[2100151] [Originator@6876 sub=Default] VasaOp[#32] ===> FINAL FAILURE getEvents, error (INVALID_SESSION / Bad session state (TransportError)) VP (SC5020-360876) Container (SC5020-360876) timeElapsed=242 msecs (#outstanding 0)
2022-08-29T18:12:15.583Z error vvold[2100151] [Originator@6876 sub=Default] VasaSession::EventPollerCB VP SC5020-360876: getEvents failed (INVALID_SESSION, Bad session state (TransportError)) [session state: TransportError]
2022-08-29T18:12:45.593Z info vvold[2100373] [Originator@6876 sub=Default] VasaSession::InitSoap: https://172.21.11.201:3034/vasa-provider/vasa2/vasaService session state (TransportError) - recreating soap


Environment

VMware vSphere ESXi 7.0

Cause


An optimization (SCSIBlockUnsupportedOpcodesAndPages) to VMware’s Pluggable Storage Architecture (PSA) was introduced in ESXi 7.0 U3 which is designed to prevent issuing commands to storage devices after receiving a SCSI Command failure with an Invalid Operation Code.

To add more context to the statement above, ESX will not issue commands failed with "0x5 0x20" after the first failure as target will anyway fail the request with same error. This bitmap is maintained at device level, in case of vVol configuration this will be at PE device level. Since the reads failed, we mark this as unsupported command and start failing further READ requests from PSA itself.

For the impacted vVol Datastores, we can see that Protocol Endpoint (PE) Devices previously failed SCSI Commands with error code “0x5 0x20” which translates to ILLEGAL REQUEST / INVALID COMMAND OPERATION CODE.

Due to the newer SCSIBlockUnsupportedOpcodesAndPages optimization, these devices are flagged with an Unsupported status which is then cached persistently which means that all future SCSI command attempts will fail, thus resulting in the vVol Datastore Read/Write failures we observed. 

 

Resolution

This issue is resolved in vSphere ESXi 7.0 U3f (build number 20036589).

Apply the following workaround below to disable optimization – 

1. SSH to all the ESXi hosts where the vVol PE is presented.

2. Validate whether the parameter is set to 0 : esxcfg-advcfg -g /Scsi/SCSIBlockUnsupportedOpcodesAndPages  

3. Run the command to disable the SCSIBlockUnsupportedOpcodesAndPages feature - esxcfg-advcfg -s 0 /Scsi/SCSIBlockUnsupportedOpcodesAndPages

4. Get a list of all vVol Protocol Endpoint Devices by running the following command - 

# esxcli storage core device list | grep 'Is VVOL PE\|  Display Name'|grep true -B1

5. For each of the devices identified in Step # 4, run the following VSISH command to clear the cached unsupported status, replacing the "Device Name" to reflect each individual Protocol Endpoint -

# vsish -e set /storage/scsifw/devices/<DEVICE NAME>/resetOpcodeAndPageBitmaps 1

Example - 

# esxcfg-advcfg -s 0 /Scsi/SCSIBlockUnsupportedOpcodesAndPages

# vsish -e set /storage/scsifw/devices/naa.600508b1001cb7cd0fed8eec2d95ccf6/resetOpcodeAndPageBitmaps 1