Interpreting SCSI sense codes in VMware ESXi

Interpreting SCSI sense codes in VMware ESXi

book

Article ID: 337796

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

In the vmkernel.log files on the ESXi host 6.x/7.x/8.x, you see similar warning messages:

19:26:42.068 cpu0:4103)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4100070e8e80) to NMP device "naa.********************" failed on physical path "vmhba2:C0:T1:L27" H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

00:13:23.910 cpu20:4251)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100010bf9c0) to NMP device "naa.********************" failed on physical path "vmhba3:C0:T0:L4" H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

03:44:19.039 cpu4:4100)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100020e0b00) to NMP device "naa.********************"failed on physical path "vmhba2:C0:T0:L152" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

23:45:36.239 cpu11:22687)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41000b10f000) to NMP device "naa.********************" failed on physical path "vmhba3:C0:T2:L10" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0x3

 

Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Resolution

Note: This document is only for informational purposes. It does not reflect an issue with any VMware product.

SCSI status codes recorded in the ESXi logs indicate I/O has failed with a given status. This condition may be temporary, transient, benign, or fatal for any given workload depending on the status received. The meaning of the status should be determined by consulting the T10 standards documentation. Some SCSI status codes, when received by an ESXi host, prompt path failover. For more information, see SCSI events that can trigger ESX server to fail a LUN over to another path (broadcom.com)


The information below is from T10 and the Linux Documentation Project which explains the sense codes above. T10 operates under INCITS and is responsible for SCSI Storage Interfaces. Its principal work is the Small Computer System Interface (SCSI), including the family of SCSI-3 projects. These codes are not VMware specific, but are based on SCSI standards. For more information, see Introduction to T10.

SCSI Host codes

These codes potentially come from the firmware on a host adapter or from one of several hosts that an adapter driver controls. The host_status field has the values listed below whose #defines mimic those which are only visible within the kernel (with the SG_ERR_ removed from the front of each define). A copy of these defines can be found in sg_err.h (see Appendix A):
 

SG_ERR_DID_OK
[0x00]
NO error
SG_ERR_DID_NO_CONNECT
[0x01]
Couldn't connect before timeout period
SG_ERR_DID_BUS_BUSY
[0x02]
BUS stayed busy through time out period
SG_ERR_DID_TIME_OUT
[0x03]
TIMED OUT for other reason (often this an unexpected device selection timeout)
SG_ERR_DID_BAD_TARGET
[0x04]
BAD target, device not responding?
SG_ERR_DID_ABORT
[0x05]
Told to abort for some other reason.
SG_ERR_DID_PARITY
[0x06]
Parity error. Older SCSI parallel buses have a parity bit for error detection. This probably indicates a cable or termination problem.
SG_ERR_DID_ERROR
[0x07]
Internal error detected in the host adapter.
SG_ERR_DID_RESET
[0x08]
The SCSI bus (or this device) has been reset. Any SCSI device on a SCSI bus is capable of instigating a reset.
SG_ERR_DID_BAD_INTR
[0x09]
Got an interrupt we weren't expecting
SG_ERR_DID_PASSTHROUGH
[0x0a]
Force command past mid-layer
SG_ERR_DID_SOFT_ERROR
[0x0b]
The low level driver wants a retry


For more information, see The Linux SCSI Generic (sg) HOWTO.

SCSI Device/Status codes

SCSI Status codes appear in the Status byte returned when the processing of a command completes. The code values are assigned by the T10 committee. The information on this page was accurate as of 20 August 2007. (Do not be concerned if this date looks old, as the Status code assignments change very infrequently.) For more information on Status codes, consult the latest revision of SAM-x.

Code Name

00h GOOD
02h CHECK CONDITION
04h CONDITION MET
08h BUSY
18h RESERVATION CONFLICT
28h TASK SET FULL
30h ACA ACTIVE
40h TASK ABORTED

For more information, see SCSI Status Codes.


Note: Hexadecimal numbers in the T10 documentation use the NNNh notation, whereas SCSI status codes logged to the ESX host use the equivalent 0xNNN notation; for example, 0x2 == 02h.

For additional information on SCSI device codes, see Understanding SCSI device/target NMP errors/conditions in ESX/ESXi 4.x and ESXi 5.x/6.0 (broadcom.com).
 

SCSI Sense Keys

SCSI Sense Keys appear in the Sense Data available when a command returns with a CHECK CONDITION status. The sense key contains all the information necessary to understand why the command has failed.

Code Name

0h NO SENSE
1h RECOVERED ERROR
2h NOT READY
3h MEDIUM ERROR
4h HARDWARE ERROR
5h ILLEGAL REQUEST
6h UNIT ATTENTION
7h DATA PROTECT
8h BLANK CHECK
9h VENDOR SPECIFIC
Ah COPY ABORTED
Bh ABORTED COMMAND
Dh VOLUME OVERFLOW
Eh MISCOMPARE

For more information, see SCSI Sense Keys.

Note: Hexadecimal numbers in the T10 documentation use the NNNh notation, whereas SCSI status codes logged to the ESX host use the equivalent 0xNNN notation; for example, 0x2 == 02h.
 

SCSI Additional Sense Data

SCSI Additional Sense Data takes the form of two value encoded bytes in the sense data, typically returned by the REQUEST SENSE command. The additional sense code (ASC) byte indicates information about the error exception reported in the sense key field. The additional sense code qualifier (ASCQ) indicates detailed information related to the additional sense code. See the clause describing the REQUEST SENSE command in the SCSI Primary Commands - 3 (SPC-3) draft standard for more information about sense data.

|-> ASC value (in hexadecimal)
||
|| |-> ASCQ value (in hexadecimal)
|| ||
|| || |-> Codes identifying devices that may use the ASC/ASCQ pair
|| || |-> value. (See list of device code letters below.)
|| || |
|| || | | |-> Error or exception indicated by the
|| || | | |-> ASC/ASCQ pair value.
|| || |------------| |------------------------------------------|

04/00 DTLPWROMAEBKVF LOGICAL UNIT NOT READY, CAUSE NOT REPORTABLE
04/01 DTLPWROMAEBKVF LOGICAL UNIT IS IN PROCESS OF BECOMING READY
04/02 DTLPWROMAEBKVF LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED
04/03 DTLPWROMAEBKVF LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED


For more information, see SCSI Additional Sense Data.


VMware plug-in transfer codes

For a list of SCSI plug-in NMP conditions and error codes, see Understanding SCSI plug-in NMP errors/conditions in ESX/ESXi 4.x/5.x/6.0 (broadcom.com)

Additional Information

For information on troubleshooting LUN connectivity, see Troubleshooting LUN connectivity issues on ESXi hosts (broadcom.com)