QLogic's qedf Driver receives frequent RJT (Rejected) responses for SCSI Abort commands
search cancel

QLogic's qedf Driver receives frequent RJT (Rejected) responses for SCSI Abort commands

book

Article ID: 391740

calendar_today

Updated On:

Products

VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

Symptom:

  • Datastores are frequently disconnecting from host.

Environment

VMware vSphere ESXi

Cause

  • When a Storage Array rejects an ABORT command, which is Task Management Function (TMF), it typically does this because the command that it attempted to abort never successfully made it to the storage array, meaning that it was dropped somewhere between the initiator and target. The array cannot abort a command it never received, which is why it rejects the abort request.

    In "/var/run/log/vobd.log"-

    2025-11-16T12:37:46.791Z In(14) vobd[2097956]:  [vmfsCorrelator] 11283187743902us: [esx.problem.vmfs.heartbeat.timedout] 620126d6-########-####-34800d5e96ec <Datastore_Name>

    In "/var/run/log/hostd.log", you will see below events -

    2025-11-16T12:37:46.792Z In(166) Hostd[2099631]: [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    key = 126,
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    chainId = 119952232,
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    createdTime = "1970-01-01T00:00:00Z",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    userName = "",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    host = (vim.event.HostEventArgument) {
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       name = "<Host_Name>",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       host = 'vim.HostSystem:ha-host'
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    },
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    ds = (vim.event.DatastoreEventArgument) {
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       name = "<Datastore_Name>",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       datastore = 'vim.Datastore:620126d6-########-####-34800d5e96ec'
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    },
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    eventTypeId = "esx.problem.vmfs.heartbeat.timedout",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    arguments = (vmodl.KeyAnyValue) [
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       (vmodl.KeyAnyValue) {
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->          key = "1",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->          value = "620126d6-########-####-34800d5e96ec"
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       },
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       (vmodl.KeyAnyValue) {
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->          key = "2",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->          value = (vim.event.DatastoreEventArgument) {
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->             name = "<Datastore_Name>",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->             datastore = 'vim.Datastore:620126d6-########-####-34800d5e96ec'
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->          }
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->       }
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    ],
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    objectId = "620126d6-########-####-34800d5e96ec",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    objectType = "vim.Datastore",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: -->    objectName = "<Datastore_Name>",
    2025-11-16T12:37:46.792Z In(166) Hostd[2099271]: --> }
    2025-11-16T12:37:46.793Z In(166) Hostd[2099631]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 140136 : Lost access to volume 620126d6-########-####-34800d5e96ec (<Datastore_Name>) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.


    In "/var/run/log/vmkernel.log", you will see below events -

    2025-11-16T12:16:21.906Z In(182) vmkernel: cpu46:2098293)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x0 (0x45ba101ae880, 0) to dev "naa.############################" on path "vmhba3:C0:T2:L15" Failed:
    2025-11-16T12:16:21.906Z In(182) vmkernel: cpu46:2098293)NMP: nmp_ThrottleLogForDevice:3898: H:0x7 D:0x0 P:0x0 . Act:NONE. cmdId.initiator=0x4539bc31bc38 CmdSN 0x0

    2025-11-16T12:16:21.916Z In(182) vmkernel: cpu21:2679205)ScsiPath: 5925: Command 0x0 (cmdSN 0x0, World 0) to path vmhba3:C0:T2:L15 timed out: expiry time occurs 12ms in the past
    2025-11-16T12:16:21.916Z In(182) vmkernel: cpu1:2097287)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x0 (0x45ba101ae880, 0) to dev "naa.############################" on path "vmhba3:C0:T2:L15" Failed:
    2025-11-16T12:16:21.916Z In(182) vmkernel: cpu1:2097287)NMP: nmp_ThrottleLogForDevice:3898: H:0x3 D:0x0 P:0x0 . Act:NONE. cmdId.initiator=0x4539bc31bc38 CmdSN 0x0

    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu49:25875129)ql_vmk_wait_for_completion:1296 :: Wokeup: status=Success.
    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu49:25875129)ql_vmk_wait_for_completion:1317 :: Returning VMK_OK
    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu49:25875129)qedf:vmhba3:qedfc_internalAbortIO:2993:Info: abort/virtual reset completed successfully, ref is: 3
    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu49:25875129)qedf:vmhba3:qedfc_eh_abort:3102:Info: Returning: Success for oxid = 0x2b4
    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu32:2097301)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x89 (0x45ba10072c80, 2097256) to dev "naa.############################" on path "vmhba3:C0:T2:L15" Failed:
    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu32:2097301)NMP: nmp_ThrottleLogForDevice:3898: H:0x7 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=0x4309fe22f580 CmdSN 0xc9dd
    2025-11-16T12:37:04.720Z Wa(180) vmkwarning: cpu32:2097301)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.############################" state in doubt; requested fast path state update...
    2025-11-16T12:37:04.720Z In(182) vmkernel: cpu32:2097301)ScsiDeviceIO: 4686: Cmd(0x45ba10072c80) 0x89, CmdSN 0xc9dd from world 2097256 to dev "naa.############################" failed H:0x7 D:0x0 P:0x0
    2025-11-16T12:37:04.749Z In(182) vmkernel: cpu80:2098190)qedf:vmhba3:qedfc_cmd_state_handler:2132:Info: Returning: Success for oxid = 0x252
    2025-11-16T12:37:04.749Z In(182) vmkernel: cpu8:2098552)qedf:vmhba3:qedfc_fp_process_cqes:3558:Info: dummy cqe. xid: 0x252
    2025-11-16T12:37:04.749Z In(182) vmkernel: cpu8:2098552)qedf:vmhba3:qedfc_fp_process_cqes:3545:Info: Abort cqe. xid: 0x252
    2025-11-16T12:37:04.749Z In(182) vmkernel: cpu8:2098552)qedf:vmhba3:qedfc_process_abts_compl:1988:Info: ABTS response - RJT
    2025-11-16T12:37:04.749Z In(182) vmkernel: cpu8:2098552)qedf:vmhba3:qedfc_process_abts_compl:2030:Info: (2:4): Completing cmd with Host Error status (0x7), xid=0x252, SN=800e0007, worldId=18a05f5, refcnt=3 lba=0x146de2fe0 lbc=0x8 cmd 88:0:0:0:0

    2025-11-16T12:37:34.721Z Wa(180) vmkwarning: cpu1:2097279)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.############################" state in doubt; requested fast path state update...
    2025-11-16T12:37:34.721Z In(182) vmkernel: cpu1:2097279)ScsiDeviceIO: 4686: Cmd(0x45b9e8914d40) 0x89, CmdSN 0xc9ec from world 2097256 to dev "naa.############################" failed H:0x7 D:0x0 P:0x0


    2025-11-16T12:37:46.790Z Wa(180) vmkwarning: cpu24:2097309)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.############################" state in doubt; requested fast path state update...
    2025-11-16T12:37:46.790Z In(182) vmkernel: cpu24:2097309)ScsiDeviceIO: 4686: Cmd(0x45b9c3f52500) 0x89, CmdSN 0xc9f0 from world 2097256 to dev "naa.############################" failed H:0x7 D:0x0 P:0x0
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu46:25643222)qedf:vmhba3:qedfc_eh_virtual_reset:3192:Info: Returning: Success IOs aborted: 0, port_id[1e83c0]
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu46:25643222)qedf:vmhba3:qedfc_getCmdSatisfyingACondition:2881:Info: Found the I/O for MatchingWorldID 0x4316a14a98a0:0x726, state=1, ref=2
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu46:25643222)qedf:vmhba3:qedfc_eh_virtual_reset:3162:Info: vr: aborting oxid = 0x726, refcnt = 2, LBA = c900, cmdSN=0, worldId=2099139
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu46:25643222)ql_vmk_wait_for_completion:1291 :: Going to sleep.
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu49:23447462)qedf:vmhba3:qedfc_fp_process_cqes:3558:Info: dummy cqe. xid: 0x726
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu49:23447462)qedf:vmhba3:qedfc_fp_process_cqes:3545:Info: Abort cqe. xid: 0x726
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu49:23447462)qedf:vmhba3:qedfc_process_abts_compl:1988:Info: ABTS response - RJT
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu46:25643222)ql_vmk_wait_for_completion:1296 :: Wokeup: status=Success.
    2025-11-16T12:37:46.791Z In(182) vmkernel: cpu46:25643222)ql_vmk_wait_for_completion:1317 :: Returning VMK_OK

    2025-11-16T12:39:02.339Z In(182) vmkernel: cpu53:25874564)qedf:vmhba4:qedfc_eh_abort:3061:Info: IO not found. Returning Success, cmdSN=298d61b, worldId=25875233
    2025-11-16T12:39:02.339Z In(182) vmkernel: cpu53:25874564)qedf:vmhba3:qedfc_eh_abort:3061:Info: IO not found. Returning Success, cmdSN=298d61b, worldId=25875233


    Note: ABTS response - RJT  events suggests SCSI ABORT is being rejected by the storage array 

Resolution

  • There are a few reasons why a RJT or REJECTED status is returned to the initiator but all of them are a going to be layer 1 or physical issue:
    • Bad SFP
    • Low Light Level on switch port
    • Auto-negotiated switch port instead of forced/statically set link speed

  • A helpful troubleshooting step to isolate the issue is to determine if these rejections are occurring on a single HBA or path, which can narrow down which storage fabric needs to be reviewed:

└─$ cat /var/run/log/vmkernel.log |grep RJT|grep vmhba3|wc -l
8557

└─$ cat /var/run/log/vmkernel.log |grep RJT|grep vmhba4|wc -l
1889