PSOD crash on CISCO UCS Servers - Backtrace fnic_queue_abort_io_req & fnic_terminate_ioreq
search cancel

PSOD crash on CISCO UCS Servers - Backtrace fnic_queue_abort_io_req & fnic_terminate_ioreq

book

Article ID: 420290

calendar_today

Updated On:

Products

VMware vSphere ESXi 8.0 VMware vSphere ESXi

Issue/Introduction

Symptoms 

  • The below Log snippet from the VMkernel logs shows - io_req: 0x12345678 as Abort complete with a message "FNIC_IOREQ_ABTS_COMPLETE"
  • However later the same shows io_req: 0x12345678 as Abort pending in a later time stamp

YYYY-MM-DDTHH:MM:SS.721Z cpu1:2098306)nfnic: <2>: INFO: fnic_terminate_ioreq: 2971: io_req: 0x12345678 sc: 0xXXXXXXXXX tag: 0xcd CMD_FLAGS: 0x33100 Found IO in FNIC_IOREQ_ABTS_COMPLETE state on lun
YYYY-MM-DDTHH:MM:SS.721Z cpu1:2098306)nfnic: <2>: INFO: fnic_terminate_ioreq: 2983: dev rst io_req: 0x12345678 sc 0xXXXXXXXXX tag: 0xcd CMD_FLAGS: 0x33100 CMD_STATE: FNIC_IOREQ_ABTS_PENDING  --> The same IO request marked now as pending 
YYYY-MM-DDTHH:MM:SS.721Z cpu1:2098306)World: 3355: PRDA 0x420040400000 ss 0x0 ds 0x750 es 0x750 fs 0x0 gs 0x0
YYYY-MM-DDTHH:MM:SS.721Z cpu1:2098306)World: 3357: TR 0x768 GDT 0xfffffffffca02888 (0xffff) IDT 0xfffffffffc408000 (0xffff)
YYYY-MM-DDTHH:MM:SS.721Z cpu1:2098306)World: 3359: CR0 0x80050031 CR3 0x949383b000 CR4 0x142668
YYYY-MM-DDTHH:MM:SS.721Z cpu68:2098356)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x8a (0x45ba4c8a3ec0, 3127449) to dev "naa.XXXXXXXXXXXXXXXXXXXX" on path "vmhba1:C0:T312:L6" Failed:
YYYY-MM-DDTHH:MM:SS.721Z cpu68:2098356)NMP: nmp_ThrottleLogForDevice:3898: H:0x1 D:0x0 P:0x0 . Act:FAILOVER. cmdId.initiator=0x430f4d4a80c0 CmdSN 0x800e001f
ESC[7m2025-11-22T06:59:44.722Z cpu68:2098356)WARNING: NMP: nmp_DeviceRetryCommand:130: Device "naa.XXXXXXXXXXXXXXXXXXXX": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.ESC[0m

  • The host crashes with the below trace "fnic_queue_abort_io_req" against the io_req: 0x12345678

    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)Backtrace for current CPU #1, worldID=2098306, fp=0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bce0:[0x420015b6bee1]fnic_queue_abort_io_req@(nfnic)#stack: 0x0, 0xXXXXXXXXX, 0xXXXXXXXXX, 0x12345678, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bd40:[0x420015b6eaf2]fnic_terminate_ioreq@(nfnic)#stack: 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXXXXXXXXX, 0x12345678 ---> io-req 
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bda0:[0x420015b6ef86]fnic_term_allio@(nfnic)#stack: 0xXXXXXXXXX, 0x0, 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9be00:[0x420015b6f130]fnic_cleanup@(nfnic)#stack: 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXX, 0xXXXXXXXXX, 0xXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9be10:[0x420015b7163d]fnic_reset@(nfnic)#<None>stack: 0xXX, 0xXXXXXXXXX, 0xXX, 0xXX, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9be30:[0x420015b71800]fnic_host_reset@(nfnic)#stack: 0xXXXXXXXXX, 0x0, 0xXX, 0xXXXXXXXXX, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9be60:[0x420015b755f9]fnic_tport_exch_reset@(nfnic)#stack: 0xXXXXXXXXX, 0xXX, 0x0, 0xXXXXXXXXX, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bef0:[0x420015b5990a]fdls_delete_tport@(nfnic)#stack: 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXXXXXXXXX, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bf10:[0x420015b5bf21]fdls_tport_timer_callback@(nfnic)#stack: 0xXXXXXXXXX, 0xXXXXXXXXX, 0x0, 0x0, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bf60:[0x420014a3a9b2]VmkTimerQueueWorldFunc@vmkernel#stack: 0x0, 0xXX, 0xXX, 0xXXXXXXXXX, 0xXXXXXXXXX
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9bfe0:[0x4200150dc88e]CpuSched_StartWorld@vmkernel#stack: 0x0, 0xXXXXXXXXX, 0x0, 0x0, 0x0
    YYYY-MM-DDTHH:MM:SS.738Z cpu1:2098306)0x4539e1a9c000:[0x420014b453af]Debug_IsInitialized@vmkernel#stack: 0x0, 0x0, 0x0, 0x0, 0x0
    YYYY-MM-DDTHH:MM:SS.747Z cpu1:2098306)ESC[45mESC[33;1mVMware ESXi 8.0.3 [Releasebuild-24859861 x86_64]ESC[0m
    #PF Exception 14 in world 2098306:tq:tq-iport IP 0x420015b6bee1 addr 0x58
  • The current Driver nfnic version - 5.0.0.46

Environment

  • ESXi 8.x
  • ESXi 7.x

Cause

  • During reset recovery process io_req has already been completed and the same updated back to upper layer of the ESXi 

  • Driver still tries to abort this cmd, it causes this PSOD on the ESXi Host.

Resolution

  • Since this is a Async nfnic driver, engage CISCO hardware team for driver related issue.
  • CISCO - Article link