Luns are not visible after updating the host to 7.X with Emulex OneConnect FCoE brcmfcoe drivers.

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

On upgrading ESXi host to 7.X version that has Emulex OneConnect FCoE Initiator (brcmfcoe)installed, the datastores become inaccessible, you may experience the symptoms:

Observed that the vmhba is online, but the associated devices are not appearing or accessible.
Performing an adapter rescan will not restore access to the devices
The device's connectivity is restored when the array controller undergoes reboot.
Whenever the host is rebooted, the connection to the storage devices is disrupted, resulting in a loss of access to the storage devices.

In the /var/log.boot.log file, we see entries like,

2022-05-12T05:11:02.956Z cpu6:66210)brcmfcoe: lpfc_set_disctmo:4486: 0:(0):0247 Start Discovery Timer state x20 Data: x21 x43084cdb5268 x2 x0
2022-05-12T05:11:02.956Z cpu6:66210)brcmfcoe: lpfc_nlp_set_state:4238: 0:(0):0904 NPort state transition x010000, PLOGI -> PLOGI
2022-05-12T05:11:02.956Z cpu6:66210)brcmfcoe: lpfc_nlp_set_state:4265: 0:(0):2518 NPort x010000, SID x00ffff, flag x40000 add_flag x0 state x1
2022-05-12T05:11:02.956Z cpu6:66210)brcmfcoe: __lpfc_findnode_did:5053: 0:(0):0929 FIND node DID Data: 0x439584e3e950 x10000 x40000 x1000002 0x43084ce58730
2022-05-12T05:11:02.956Z cpu6:66210)brcmfcoe: lpfc_nlp_get:6056: 0:(0):2519 New Ref: ndlp:0x439584e3e950 did x010000 usgmap:x1 refcnt:3
2022-05-12T05:11:02.956Z cpu6:66210)brcmfcoe: lpfc_prep_els_iocb:305: 0:(0):0116 Xmit ELS command x3 to remote NPORT x10000 I/O tag: x3dd, port state: x20 fc_flag:x810228 ref_cnt 3
022-05-12T05:11:21.941Z cpu20:66210)brcmfcoe: lpfc_sli_sp_handle_rspiocb:3351: 0:0328 Rsp Ring 1 error: IOCB Data: x40000000 x5b547540 x1 x0 x2 x10000 x3dd x14428a36 x0 x0 x0 x0 x0 x0 x0 x0
2022-05-12T05:11:21.941Z cpu20:66210)brcmfcoe: __lpfc_findnode_did:5053: 0:(0):0929 FIND node DID Data: 0x439584e3e950 x10000 x40000 x1000002 0x43084ce58730
2022-05-12T05:11:21.941Z cpu20:66210)brcmfcoe: lpfc_cmpl_els_plogi:1915: 0:(0):0102 PLOGI completes to NPort x10000 Status: x3 Reason x2 Data: x40000 2 x2
2022-05-12T05:11:21.941Z cpu20:66210)brcmfcoe: lpfc_els_retry:3588: 0:(0):0108 No retry ELS command x3 to remote NPORT x10000 Retried:3 Error:x3/x2
2022-05-12T05:11:21.941Z cpu20:66210)WARNING: brcmfcoe: lpfc_cmpl_els_plogi:1949: 0:(0):2753 PLOGI failure DID:010000 Status:x3/x2 State: x1 Ref: 2 Flags: x40000

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 7.0

Cause

Observed a failure to establish a connection between a host and a target port in a Storage Area Network (SAN) environment, preventing the enumeration of device paths and causing system issues. Log entries indicate a ‘Status:x3/x2,’ suggesting that the Host Bus Adapter (HBA) firmware did not receive a response for a Port Login (PLOGI) request. Further investigation revealed that the issue is related to a driver problem on the target-side HBA.

The PLOGI login request initiated by the host successfully reached the SAN, indicating that the connection from the host’s side was established. However, the response acknowledging the acceptance (ACC) of the PLOGI login request failed to reach the affected hosts, preventing the HBA from proceeding with the expected PRLI (Process Login) operation necessary for higher-level SCSI communication.

After waiting for 20 seconds without receiving a response, the HBA firmware rejected the PLOGI request due to a lack of timely response, indicating a communication problem between the host and target-side HBA. This issue must be resolved to establish a successful connection between the host and target port, allowing proper SCSI communication.

Resolution

The issue remains unresolved without a permanent solution at present

As a workaround perform a Fabric Login (FLOGI) reset
Open the console or SSH to the ESXi host, run the command

# esxcli storage san fc reset -A vmhbaX

Where vmhbaX is the adapter on which you are performing the LIP reset.

After executing the reset command, the datastore will become visible.

NOTE: To execute a command during the ESXi boot process, modify the local.sh file located at the /etc/rc.local.d/ directory.

To modify the local.sh file:

Open the local.sh file using the vi editor
For each FC adapter, please include the reset command mentioned earlier and ensure that it is positioned above the line exit 0 in the script
For Example:

[root@ESXi-7u3-SBI-04:/etc/rc.local.d] cat local.sh
#!/bin/sh ++group=host/vim/vmvisor/boot

# local configuration options

# Note: modify at your own risk!  If you do/use anything in this
# script that is not part of a stable API (relying on files to be in
# specific places, specific tools, specific output, etc) there is a
# possibility you will end up with a broken system after patching or
# upgrading.  Changes are not supported unless under direction of
# VMware support.

# Note: This script will not be run when UEFI secure boot is enabled.
localcli network nic list > /vmfs/volumes/SBI-02/nic.txt

localcli storage san fc reset -A vmhb1
localcli storage san fc reset -A vmhb0

exit 0

Save the changes made to local.sh file

The implemented changes will enable the reset command to automatically run after each reboot.