After unmapping a LUN, ESXi host fails with a purple diagnostic screen and reports FRAME DROP events

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

After unmapping a LUN in ESXi 5.5, you experience these symptoms:

ESXi host fails with a purple diagnostic screen.
In the purple diagnostic screen, you see backtrace similar to:

2016-01-23T00:10:50.992Z cpu62:33922)WARNING: iodm: vmk_IodmEvent:193: vmhba2: FRAME DROP event has been observed 6 times in the last one minute. This suggests a problem with Fibre Channel link/switch!.
2016-01-23T00:10:50.997Z cpu37:33222)WARNING: iodm: vmk_IodmEvent:193: vmhba1: FRAME DROP event has been observed 6 times in the last one minute. This suggests a problem with Fibre Channel link/switch!.
2016-01-23T00:10:51.187Z cpu37:33215)World: 8777: PRDA 0x418049400000 ss 0x0 ds 0x4018 es 0x4018 fs 0x4018 gs 0x4018
2016-01-23T00:10:51.187Z cpu37:33215)World: 8779: TR 0x4020 GDT 0x412546fe1000 (0x402f) IDT 0x418017ef4000 (0xfff)
2016-01-23T00:10:51.187Z cpu37:33215)World: 8780: CR0 0x8001003d CR3 0x7abce000 CR4 0x216c
2016-01-23T00:10:51.253Z cpu37:33215)Backtrace for current CPU #37, worldID=33215, ebp=0x412546fdd120
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd120:[0x4180184becc9][email protected]#v2_2_0_0+0x89 stack: 0x412546fdd180,
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd210:[0x4180185d8e2b]lpfc_handle_fcp_err@<none>#<none>+0xbb7 stack: 0x4125000000c4, 0x418
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd360:[0x4180185d97fe]lpfc_scsi_cmd_iocb_cmpl@<none>#<none>+0x9c2 stack: 0x410ceeb41500, 0
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd520:[0x4180185df0fb]lpfc_sli4_fcp_process_wcqe@<none>#<none>+0xbb stack: 0x412546fdd5b0,
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd590:[0x4180185e64e8]lpfc_sli4_fcp_handle_wcqe@<none>#<none>+0x108 stack: 0x412546fdd600,
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd660:[0x4180185f68d0]lpfc_sli4_handle_eqe@<none>#<none>+0x7b4 stack: 0x410a571c1ae0, 0x41
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd690:[0x4180185f7065]lpfc_sli4_intr_bh_handler@<none>#<none>+0x89 stack: 0x410a571c1360,
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd720:[0x418017e6abe7]IRQBH@vmkernel#nover+0x2e7 stack: 0x412546fdd7e0, 0x2, 0x10000000000
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd7b0:[0x418017e2e9ff]BH_DrainAndDisableInterrupts@vmkernel#nover+0xf3 stack: 0x3, 0x41804
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd7f0:[0x418017e64277]IDT_IntrHandler@vmkernel#nover+0x1af stack: 0x412546fdd910, 0x418018
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd800:[0x418017ef2064]gate_entry@vmkernel#nover+0x64 stack: 0x4018, 0x4018, 0x0, 0x0, 0x0
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd910:[0x4180181a7b7a]Power_HaltPCPU@vmkernel#nover+0x1fe stack: 0xaa1da00, 0x4100355cc000
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdd980:[0x418018051e61]CpuSchedIdleLoopInt@vmkernel#nover+0x4c5 stack: 0x2546fdda20, 0x410a
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddae0:[0x418018057f30]CpuSchedDispatch@vmkernel#nover+0x1630 stack: 0x412546fddb50, 0x25e6
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddb50:[0x418018059275]CpuSchedWait@vmkernel#nover+0x245 stack: 0x1, 0x412546fddb80, 0x6874
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddbb0:[0x418018059a47]CpuSched_SleepUntilTC@vmkernel#nover+0xfb stack: 0x1, 0x3200000000,
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddbe0:[0x41801815574a]SCSI_DelayOnTransientFailure@vmkernel#nover+0x5e stack: 0x2000000000
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddc50:[0x4180181456f1]SCSISyncPathCmdWithRetriesInt@vmkernel#nover+0xd9 stack: 0x200, 0x41
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddca0:[0x418018145794]vmk_ScsiIssueSyncPathCommandWithRetries@vmkernel#nover+0x4c stack: 0
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddcf0:[0x418018aee265][email protected]_lib_cx#0+0x99 stack: 0x4180187a5
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddd60:[0x418018aee3c1][email protected]_lib_cx#0+0xa1 stack: 0x4100
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddda0:[0x418018af19f1]satp_inv_prepareInternalNaviReg@<none>#<none>+0x2d stack: 0x412546fd
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fdde20:[0x418018aefa47]satp_inv_updatePathStates@<none>#<none>+0x1f3 stack: 0x412546fddeb0,
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddef0:[0x418018798fe1][email protected]#v2_2_0_0+0x6d stack: 0x
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddf10:[0x41801878e928][email protected]#v2_2_0_0+0x2c stack: 0x410b8912dac0
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddf30:[0x41801810dc41]SCSIDeviceProbe@vmkernel#nover+0xc1 stack: 0x0, 0x412546fe7000, 0x41
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddfd0:[0x418017e6133a]helpFunc@vmkernel#nover+0x6b6 stack: 0x0, 0x0, 0x0, 0x0, 0x0
2016-01-23T00:10:51.253Z cpu37:33215)0x412546fddff0:[0x418018056842]CpuSched_StartWorld@vmkernel#nover+0xfa stack: 0x0, 0x0, 0x0, 0x0, 0
2016-01-23T00:10:51.284Z cpu37:33215)VMware ESXi 5.5.0 [Releasebuild-2456374 x86_64]
#PF Exception 14 in world 33215:helper31-1 IP 0x4180184becc9 addr 0x410c89af6008

</none></none></none></none></none></none></none></none></none></none></none></none></none></none></none></none>
In the /var/log/vmkernel.log file on the affected ESXi host, you see an entry similar to:

2015-12-11T11:57:01.035Z cpu58:33904)WARNING: iodm: vmk_IodmEvent:193: vmhba0: FRAME DROP event has been observed 217 times in the last one minute. This suggests a problem with Fibre Channel link/switch!
2015-12-11T11:57:12.044Z cpu60:50767)WARNING: iodm: vmk_IodmEvent:193: vmhba0: FRAME DROP event has been observed 400 times in the last one minute. This suggests a problem with Fibre Channel link/switch!
2015-12-11T11:57:23.001Z cpu48:2805963)WARNING: iodm: vmk_IodmEvent:193: vmhba0: FRAME DROP event has been observed 400 times in the last one minute. This suggests a problem with Fibre Channel link/switch!

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 5.5

Cause

This issue occurs when a LUN is unmapped using these steps and a rescan is not triggered:

Unmount a VMFS datastore.
Detach the LUN which is umounted from ESXi host.
Un-map this LUN from storage to this ESXi host.

Resolution

This issue is resolved in vSphere 5.5 Patch 11, available at VMware Downloads.

To work around this issue if you do not want to upgrade, after unmapping the LUN from storage to ESXi host, rescan the adapter on the ESXi host to avoid the purple diagnostic screen.

To perform an adapter rescan of the ESXi host:

Go to the ESXi host in the vSphere Web Client.
Click the Manage tab
Click Storage.
Click Storage Adapters, and select the adapter to rescan from the list.
Click Rescan Adapter.

Additional Information

For more information about the rescanning storage adapters, see the Storage Refresh and Rescan Operations section of the vSphere 5.5 Storage guide.

LUN をマッピング解除した後、ESXi ホストが失敗し、紫色の診断画面が表示され、FRAME DROP イベントが報告される