2025-06-27T02:28:43.524Z cpu19:9098185)Backtrace for current CPU #19, worldID=9098185, fp=0x43204a6f4990
2025-06-27T02:28:43.524Z cpu19:9098185)0x453a2a81bdf0:[0x42000863f679]irdma_irq_spinlock_acquire@(irdman)#<None>+0x1 stack: 0x431da4456480, 0x404a817760, 0x4321d0562c40, 0x4321d0562dd0, 0x4321d0562c48
2025-06-27T02:28:43.524Z cpu19:9098185)0x453a2a81be00:[0x4200086431d8]irndrv_RDMAOpPollComplQueue@(irdman)#<None>+0x49 stack: 0x4321d0562c40, 0x4321d0562dd0, 0x4321d0562c48, 0x369, 0x43204a817130
2025-06-27T02:28:43.524Z cpu19:9098185)0x453a2a81bed0:[0x420008347626]vmk_RDMAPollComplQueue@com.vmware.rdma#1+0x43 stack: 0x42000834760c, 0x453a2a81bfa0, 0x0, 0x0, 0x17
2025-06-27T02:28:43.524Z cpu19:9098185)0x453a2a81bf10:[0x4200085fe5c6]nr_CompletionWorld@(nvmerdma)#<None>+0xeb stack: 0x43204a6d1080, 0x43204a8291d0, 0x0, 0x43200bad0069, 0x43204a8ae850
2025-05-21T00:10:04.792Z In(182) vmkernel: cpu0:2097582)NVMFEVT:330 Received event 0 (0x4313e9dc4900) for vmhba## event queue.
2025-05-21T00:13:11.792Z In(182) vmkernel: cpu0:2097582)NVMFEVT:330 Received event 1 (0x4313e9dc4900) for vmhba## event queue.
2025-05-21T00:36:10.554Z Wa(180) vmkwarning: cpu35:2099251)WARNING: irdman: irndrv_RDMAOpAllocFastRegPageList:5813: PF Reset ongoing. Operation cannot be executed.
2025-05-21T00:36:10.554Z In(182) vmkernel: cpu35:2099251)nvmerdma:1602 [ctlr 266, queue 0] failed to allocate fast reg page list: Failure
2025-05-21T00:36:10.554Z In(182) vmkernel: cpu35:2099251)nvmerdma:411 [ctlr 266, queue 0] failed to allocate FRMR: Failure
2025-05-21T00:36:10.554Z In(182) vmkernel: cpu35:2099251)nvmerdma:886 [ctlr 266, queue 0] Failed to reset: Failure
2025-05-21T00:36:10.554Z In(182) vmkernel: cpu35:2099251)nvmerdma:1928 [ctlr 266, queue 0] reset failed: Failure
2025-05-21T00:36:10.554Z In(182) vmkernel: cpu35:2099251)NVMEDEV:7939 Controller 266, queue 0 reset complete. Status Failure
2025-05-21T00:36:10.554Z Wa(180) vmkwarning: cpu35:2099251)WARNING: NVMEDEV:8276 Failed to restart admin queue for controller 266, status: Failure
VMware vSphere ESX
NVMe RDMA
There are two issues causing this:
Both VMware and Intel engineering are aware of this issue and are actively working on code fix (VMware) and a new driver/firmware (Intel) to resolve this issue.
Workaround
Don't enable NVMe/RDMA until a fix is available