ESX host becomes unresponsive after a rescan with QLogic iSCSI HW HBA
search cancel

ESX host becomes unresponsive after a rescan with QLogic iSCSI HW HBA

book

Article ID: 304608

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Host is unresponsive in VMware Infrastructure Client and vCenter Server after initiating a rescan
  • The ESX host still responds to pings and is available via SSH
  • A review of /var/log/vmkernel shows:

    Sep 29 01:19:15 ESX03 vmkernel: 0:01:00:53.138 cpu2:1063)scsi0:0:0:0: abort srb=0x8a05c10, cmd=0x3de05d80, state=ACTIVE, r_start=361314 , u_start=361314
    Sep 29 01:19:15 ESX03 vmkernel: 0:01:00:53.138 cpu2:1063)scsi0: qla4xxx_eh_abort: return with status = 2003
    Sep 29 01:19:15 ESX03 vmkernel: 0:01:00:53.138 cpu2:1063)LinSCSI: 3201: Abort failed for cmd with serial=42, status=bad0001, retval=bad0001
    Sep 29 01:19:55 ESX03 vmkernel: 0:01:01:33.138 cpu2:1063)scsi0:0:0:0: abort srb=0x8a05c10, cmd=0x3de05d80, state=ACTIVE, r_start=361314 , u_start=361314
    Sep 29 01:19:55 ESX03 vmkernel: 0:01:01:33.138 cpu2:1063)scsi0: qla4xxx_eh_abort: return with status = 2003
    Sep 29 01:19:55 ESX03 vmkernel: 0:01:01:33.138 cpu2:1063)LinSCSI: 3201: Abort failed for cmd with serial=42, status=bad0001, retval=bad0001
    Sep 29 01:20:35 ESX03 vmkernel: 0:01:02:13.138 cpu2:1063)scsi0:0:0:0: abort srb=0x8a05c10, cmd=0x3de05d80, state=ACTIVE, r_start=361314 , u_start=361314
    Sep 29 01:20:35 ESX03 vmkernel: 0:01:02:13.138 cpu2:1063)scsi0: qla4xxx_eh_abort: return with status = 2003
    Sep 29 01:20:35 ESX03 vmkernel: 0:01:02:13.138 cpu2:1063)LinSCSI: 3201: Abort failed for cmd with serial=42, status=bad0001, retval=bad0001
    Sep 29 01:20:45 ESX03 vmkernel: 0:01:02:23.086 cpu0:1024)Helper: 826: Dumping non-active requests on HELPER_MISC_QUEUE at 9732268740309
    Sep 29 01:20:45 ESX03 vmkernel: 0:01:02:23.086 cpu0:1024)Helper: 826: Dumping active requests on HELPER_MISC_QUEUE at 9732268790887
    Sep 29 01:20:45 ESX03 vmkernel: 0:01:02:23.086 cpu0:1024)Helper: 833: 1: status=2 func=0x68ea3c since=9394359681814
    Sep 29 01:20:45 ESX03 vmkernel: 0:01:02:23.086 cpu0:1024)VMNIX: VmkCall: 351: Waiting for long-running helper request


Resolution

In this case, the HBA configurable parameter AFW_Device_Timeouts is set to off.
When AFW_Device_Timeout is set to on, the HBA firmware ignores the IOCB command timeout values specified by the host completions from the queue.
This configurable option is required for performance and connectivity reasons and needs to be enabled. This setting is enabled by default. We have seen QLogic HBAs come out of the factory sealed with this settings accidentally disabled. Also, it has been reported that a firmware update to these cards may disable these settings.
To ensure that your adapters are running with the correct settings, use the QLogic ISCLI utility to load the factory defaults for the HBA, then enable/change only the settings recommended by the storage vendor.