ESXi PSOD in ReportLun path - exposed when the target goes on and off continuously
search cancel

ESXi PSOD in ReportLun path - exposed when the target goes on and off continuously

book

Article ID: 318441

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Due to a race condition, ESXi host experiences PSOD when running NFNIC driver 4.0.0.63 and 4.0.0.65 during rapid FC topology changes.
  • /var/run/log/vmkernel.log or var/core/vmkernel-zdump.1 logs show the below snippets. The below snippets is seen after one of the final link flaps that triggered the PSOD.

    YYYY-MM-DDTHH:MM:SSZ cpu38:2098030)nfnic: <1>: INFO: fnic_tport_exch_reset: 4464: Tport exch reset: target id: 14 tport->fcid: 0x0a1100
    YYYY-MM-DDTHH:MM:SSZ  cpu38:2098030)nfnic: <1>: INFO: fnic_tport_cleanup_io: 3957: ABTS is pending
    YYYY-MM-DDTHH:MM:SSZ  cpu38:2098030)nfnic: <1>: INFO: fnic_tport_cleanup_io: 3958: IOREQ 0x459b44800740:
    port_id=1376064
    start_time = 231565857095324
    abort event = 0
    requiredlen = 8208
    Status = 12582979
    Message = 25$
    YYYY-MM-DDTHH:MM:SSZ  cpu15:2098187)nfnic: <1>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1696: io_req: 0x459b44800740 sc: 0x430e46988950 tag: 0x750 CMD_FLAGS: 0xc00053 CMD_STATE: FNIC_IOREQ_ABTS_PENDING ABTS pending hdr status: FCPIO_ABORTED scsi_status:$
    YYYY-MM-DDTHH:MM:SSZ cpu15:2098187)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2173: fcpio hdr status: FCPIO_TIMEOUT
    YYYY-MM-DDTHH:MM:SSZ  cpu15:2098187)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2227: io_req: 0x459b44800740 sc: 0x430e46988950 id: 0x750 CMD_FLAGS: 0xc00073 CMD_STATE: FNIC_IOREQ_ABTS_PENDINGhdr status: FCPIO_TIMEOUT ABTS cmpl received
    [7mYYYY-MM-DDTHH:MM:SSZ  cpu15:2098187)WARNING: nfnic: <1>: fnic_process_driverIO: 1517: tport wwpn: 0x50000975a8112233 fcid: 0x0a1100 hstatus: 1 dstatus: 0[0m

     

  • PSOD occurs due to race condition related to removing a TPort with outstanding IO/ABTS or Report Luns request.


    YYYY-MM-DDTHH:MM:SSZ  cpu15:2098187)Backtrace for current CPU ##, worldID=######, fp=#x###########
    #################################PanicvPanicInt#################################
    #################################Panic_NoSave#################################
    #################################LockCheckSelfDeadlockInt#################################
    #################################MCS_LockWait#################################
    #################################MCSLockWithFlagsWork#################################
    #################################vmk_SpinlockLock#################################
    #################################fnic_fcpio_icmnd_cmpl_handler#################################
    #################################fnic_fcpio_cmpl_handler#################################


  • This issue is observed where NFNIC driver versions from 4.0.0.59 to 4.0.0.65 is present.

Environment

VMware vSphere ESXi 6.7
 
VMware vSphere ESXi 7.x

Cause

  • This has been seen in environments where Fibre Channel (FC) links experience rapid flapping prior to the completion of certain FC failure mitigation timeouts while there is ongoing input/output (IO) activity on that link.

Resolution

  • Issue is resolved in NFNIC Driver version 4.0.0.70. Refer Finding IO Drivers in the Broadcom Support Portal for driver download instructions

    Workaround:
    Until new version of driver is released, Cisco recommends addressing any underlying hardware conditions with SFPs, Ports, etc. that expose this condition.

Additional Information