Messages are displayed which are similar to:Frequent PowerOn Reset Unit Attentions are occurring on path vmhba0:C1:T0:L0. This may indicate a storage problem. Affected device: naa.###################. Affected datastores:vmdatafilescpu0:2659)ScsiCore: 1460: Power-on Reset occurred on vmhbaX:C0:T2:L0cpu7:2055)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x2a (0x41244038d380, 2056) to dev "naa.###################" on path "vmhbaX:C0:T1:L0" Failed: H:0xb D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:NONE
/var/log/vmkernel.log log file reports SCSI warnings against the paths for one or more devices, which indicate permanent device loss.
logs may report H:0x1 SCSI code ("no connection"), or logical unit not supported (H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0), or logical unit not accessible (H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0xa).
esxcli storage san fc list will FcDevice: Adapter: vmhbaX Port ID: 000000 Node Name: 20:##:##:24:##:18:#7:12 Port Name: 21:##:##:24:##:18:#7:12 Speed: 8 Gbps Port Type: NPort Port State: ONLINE Model Description: HPE SN1100Q 16Gb 2p FC HBA Hardware Version:BK3210407-20 F OptionROM Version: 3.68 Firmware Version: 9.15.05 (d0d5) Driver Name: qlnativefcError getting field DriverVersion
Events similar to below are reported in /var/log/vmkernel.log as well as vCenter Server /var/log/vmware/vpxd/vpxd.log :
YYYY-MM-DDTHH:MM:SS.015Z In(182) vmkernel: cpu36:2098098)StorageFPIN: 1276: Report FC FPIN Congestion Oversubscription event (hostWWPN xxxxxxxxxxxxxxxx tgtWWPN yyyyyyyyyyyyyyyy) to vobd. 231 events have occurred since last report.
YYYY-MM-DDTHH:MM:SS.079Z In(182) vmkernel: cpu45:2098118)StorageFPIN: 1276: Report FC FPIN Congestion Credit Stall event (hostWWPN xxxxxxxxxxxxxxxx tgtWWPN yyyyyyyyyyyyyyyy) to vobd. 6 events have occurred since last report.
YYYY-MM-DDTHH:MM:SS.079Z Wa(180) vmkwarning: cpu52:[REDACTED_ID])WARNING: lpfc : vmhbaX lpfc_els_rcv_fpin_cgn:[REDACTED_ID]: [REDACTED_ID] FPIN CONGESTION WARNING Notification type Credit Stall (x2) Event Duration 10000 mSecs
YYYY-MM-DDTHH:MM:SS.079Z In(182) vmkernel: cpu52:[REDACTED_ID])StorageFPIN: [REDACTED_ID]: Report FC FPIN Congestion Credit Stall event (hostWWPN [MASKED_WWPN] tgtWWPN [MASKED_WWPN]) to vobd. 4 events have occurred since last report.
YYYY-MM-DDTHH:MM:SS.079Z In(182) vmkernel: cpu52:[REDACTED_ID])StoragePath: [REDACTED_ID]: Calling MPP NMP for link event 2 on adapter vmhbaX (hostWWPN=[MASKED_WWPN] targetWWPN=[MASKED_WWPN] targetNum = [MASKED_NUM])
YYYY-MM-DDTHH:MM:SS.079Z Wa(180) vmkwarning: cpu52:[REDACTED_ID])WARNING: NMP: nmpHandleLinkEvent:[REDACTED_ID]: Marking path vmhbaX:C0:T0:L00 flaky on link event 2 with timeoutMS = 20000 flakyMarkTC = [MASKED_ID], reEvalFlakyPathTime = 20000
Events similar to below are reported in /var/run/log/vobd.log
YYYY-MM-DDTHH:MM:SS.460Z In(14) vobd[XXXXXXX]: [HardwareCorrelator] XXXXXXXXXXXXXus: [vob.hardware.fpin.fc.congestion.creditstall] FPIN FC credit stall congestion: Host WWPN ############### , target WWPN ###############.
YYYY-MM-DDTHH:MM:SS.460Z In(14) vobd[XXXXXXX]: [HardwareCorrelator] XXXXXXXXXXXXXus: [esx.problem.hardware.fpin.fc.congestion.creditstall] FPIN FC credit stall congestion: Host WWPN ############### , target WWPN ###############.
YYYY-MM-DDTHH:MM:SS.461Z In(14) vobd[XXXXXXX]: [HardwareCorrelator] XXXXXXXXXXXXXus: [vob.hardware.fpin.fc.congestion.creditstall] FPIN FC credit stall congestion: Host WWPN ############### , target WWPN ###############.
YYYY-MM-DDTHH:MM:SS.461Z In(14) vobd[XXXXXXX]: [HardwareCorrelator] XXXXXXXXXXXXXus: [esx.problem.hardware.fpin.fc.congestion.creditstall] FPIN FC credit stall congestion: Host WWPN ############### , target WWPN ###############.
VMware vSphere ESXi 8.0.x
FPIN (Fabric Performance Impact Notifications) capability was added in ESXi 8.0 U2 to be able to better understand fabric related issues/events. This module will also print to /var/log/vmkernel.log when there are fabric events happening. The events that FPIN tracks and will report on are:
When FPIN events are received, the fabric health of the switching fabric should immediately be investigated. For the example listed above, this initiator is receiving FPINs that points to both Congestion as well as a Credit Stall. A Credit Stall is reported when the B2B credits for an ISL connection is reaching zero, which typically happens when there is a Slow Drain device in the fabric. Regardless, the presence of FPIN events indicates that the fabric should immediately be investigated for root cause otherwise the fabric health could continue to reduce and an outage could occur.