Presenting LUNs from Nimble array to Cisco UCS hosts can result in APD state on ESXi 6.7 when using NFNIC driver 4.0.0.62 or 4.0.0.63
search cancel

Presenting LUNs from Nimble array to Cisco UCS hosts can result in APD state on ESXi 6.7 when using NFNIC driver 4.0.0.62 or 4.0.0.63

book

Article ID: 317959

calendar_today

Updated On: 02-18-2025

Products

VMware vSphere ESXi

Issue/Introduction

When presenting Nimble LUNs to hosts, RSCNs are observed followed by the array rejecting an ADISC request before the hosts go into an APD state:

RSCN Received:
nfnic: <2>: INFO: fnic_fdls_validate_and_get_frame_type: 3441: Received RSCN from target FCTL: 0x29 type: 0x1 s_id: 0xbe0540.

Array rejecting ADISC: 

YYYY-MM-DD HH:MM:SS cpu0:2097904)nfnic: <2>: INFO: fdls_process_tgt_adisc_rsp: 1916: ADISC returned FC_LS_REJ from target: 0xbe0540
YYYY-MM-DD HH:MM:SS cpu29:2097897)nfnic: <1>: INFO: fdls_process_tgt_adisc_rsp: 1916: ADISC returned FC_LS_REJ from target: 0xbf05e0
YYYY-MM-DD HH:MM:SS cpu0:2097904)nfnic: <2>: INFO: fdls_process_tgt_adisc_rsp: 1916: ADISC returned FC_LS_REJ from target: 0xbe0560


APD:

YYYY-MM-DD HH:MM:SS cpu25:2097623)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1844: Could not select path for device "eui.095f4daec18b1ac96c9ce900a073fbda".
YYYY-MM-DD HH:MM:SS cpu16:2097934)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "eui.095f4daec18b1ac96c9ce900a073fbda" - issuing command 0x459b2c48e140
YYYY-MM-DD HH:MM:SS cpu16:2097934)WARNING: vmw_psp_rr: psp_rrSelectPath:2177: Could not select path for device "eui.095f4daec18b1ac96c9ce900a073fbda".
YYYY-MM-DD HH:MM:SS cpu16:2097934)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "eui.095f4daec18b1ac96c9ce900a073fbda" - failed to issue command due to Not found (APD), try again...

Environment

VMware vSphere ESXi 6.7

Cause

Nimble arrays will start a Unit Attention timer (40 seconds by default) when it associates a LUN to an initiator group. The array is expecting the initiators to be sending I/O during this time period so it can fail that I/O with a CHECK CONDITION, UNIT ATTENTION so the initiators is aware that there has been a change in LUN presentation and then to pick up the change by replying with REPORT_LUNS. If this timer expires, it will assume that the initiators that did not send any I/O during the time period are having a fabric issue and then sends an RSCN to those initiators to get them to logout from the fabric and then back in.

The Cisco Native FNIC (nfnic) driver is not performing a full LOGO and instead attempts to try to proceed with an ADISC which is rejected by the Nimble array since the array has already logged out the affected initiators.

Resolution

This issue only affects NFNIC driver 4.0.0.62 & 4.0.0.63. Cisco has released an updated NFNIC driver (4.0.0.65) that addresses this behavior:

https://support.broadcom.com/group/ecx/productfiles?subFamily=VMware%20vSphere&displayGroup=VMware%20vSphere%20-%20Standard&release=6.7&os=&servicePk=202617&language=EN 

The issue is being tracked under: CSCvv43938: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvv43938

Workaround:

As a workaround for this issue, Nimble and VMware recommends changing the Unit Attention (UA) timer on the Nimble array from 40 seconds to 300. This will ensure that the ESXi hosts are always sending I/O within that time period so the array can respond with a Unit Attention and the ESXi hosts will pick up the change.