VMkernel
or messages
logs shortly before an HBA stops responding to the driver. The error is similar to:ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40
Note:
This issue only applies if you see this specific alert in the vmkernel or messages log files. If you do not see this message, you are not experiencing this issue.
VMkernel
or messages
logs show that a card has stopped responding to commands:vmkernel: 6:01:42:36.189 cpu15:4274)<6>qla2xxx 0000:1a:00.0: qla2x00_abort_isp: **** FAILED ****
vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: Failed mailbox send register test
VMkernel
or messages
logs show the QLogic HBA card is offline:vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: ISP error recovery failed - board disabled
VMkernel
or messages
logs show a card has stopped responding to commands:vmkernel: 6:22:52:00.983 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:(0):2530 Mailbox command x23 cannot issue Data: xd00 x2
vmkernel: 6:22:52:32.408 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:0310 Mailbox command x5 timeout Data: x0 x700 x0x4100a2811820
vmkernel: 6:22:52:32.408 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:0345 Resetting board due to mailbox timeout
vmkernel: 6:22:53:02.416 cpu2:4684)<3>lpfc820 0000:15:00.0: 0:2813 Mgmt IO is Blocked d00 - mbox cmd 5 still active
vmkernel: 6:22:53:02.416 cpu2:4684)<3>lpfc820 0000:15:00.0: 0:(0):2530 Mailbox command x23 cannot issue Data: xd00 x2
vmkernel: 6:22:53:33.833 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:0310 Mailbox command x5 timeout Data: x0 x700
/var/log/vmkernel.log
file shows errors similar to:ScsiDeviceIO: 2316: Cmd(0x41240074e3c0) 0x1a, CmdSN 0x12ee to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
ScsiDeviceIO: 2316: Cmd(0x41240074e3c0) 0x4d, CmdSN 0x12f1 to dev "mpx.vmhba1:C0:T8:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x35 0x1.
VMkernel
or messages
logs show the controller has stopped responding to commands:Note: This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.
ESXi 4.1 and later versions introduced interrupt remapping code that is enabled by default. This code is incompatible with some servers. This technology has been introduced by the vendor for more efficient IRQ routing and which should improve performance.
Note: If this issue occurs in the PCI device from which the ESXi/ESX host boots (either locally or using SCSI/RAID), or when the host boots from SAN using iSCSI/FC HBA, the APIC error(s) is not logged. To troubleshoot the issue in this case, enable and configure remote syslog logging. For more information, see Configuring syslog on ESXi 5.0 (2003322). Alternatively, you can test this by disabling IRQ remapping.
Several server vendors have released fixes in the form of Server BIOS updates. Contact your server vendor to see if they have a fix available. For IBM models, including but not limited to the IBM BladeCenter HS22 series and System x3400/x3500 and x3600 series systems, see the IBM Knowledge Base article MIGR-5086606 for a firmware update and additional information.
# esxcfg-advcfg -k TRUE iovDisableIR
# auto-backup.sh
# reboot
# esxcfg-advcfg -j iovDisableIR
iovDisableIR=TRUE
ESXi 5.x and ESXi 6.0.x does not provide this parameter as a GUI client configurable option. It can only be changed using the esxcli
command or via the PowerCLI.
esxcli
command:# esxcli system settings kernel list -o iovDisableIR
Name Type Description Configured Runtime Default
------------ ---- --------------------------------------- ---------- ------- -------
iovDisableIR Bool Disable Interrupt Routing in the IOMMU FALSE FALSE FALSE
# esxcli system settings kernel set --setting=iovDisableIR -v TRUE
hostd
service fails or is not running, the esxcli
command does not work. In such cases, you may have to use the localcli
instead. However, the changes made using localcli
do not persist across reboots. Therefore, ensure that you repeat the configuration changes using the esxcli
command after the host reboots and the hostd
service starts responding. This ensures that the configuration changes persist across reboots.esxcli
commands as detailed above.PowerCLI> Connect-VIServer -Server xx.xx.xx.xx -User Administrator -Password passwd
PowerCLI> $myesxcli = Get-EsxCli -VMHost xx.xx.xx.xx
PowerCLI> $myesxcli.system.settings.kernel.list($false, 'iovDisableIR')
Configured : FALSE
Default : FALSE
Description : Disable Interrrupt Routing in the IOMMU
Name : iovDisableIR
Runtime : FALSE
Type : Bool
PowerCLI> $myesxcli.system.settings.kernel.set("iovDisableIR","TRUE")
true
PowerCLI>$myesxcli.system.settings.kernel.list($true, 'iovDisableIR')
Configured : TRUE
Default : FALSE
Description : Disable Interrrupt Routing in the IOMMU
Name : iovDisableIR
Runtime : FALSE
Type : Bool
/var/log/boot.gz
log file confirming that interrupt mapping has been disabled:TSC: 543432 cpu0:0)BootConfig: 419: iovDisableIR = TRUE