Presenting LUNs from Nimble array to Cisco UCS hosts can result in APD state on ESXi 6.7 when using NFNIC driver 4.0.0.62 or 4.0.0.63
search cancel

Presenting LUNs from Nimble array to Cisco UCS hosts can result in APD state on ESXi 6.7 when using NFNIC driver 4.0.0.62 or 4.0.0.63

book

Article ID: 317959

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

When presenting Nimble LUNs to hosts, RSCNs are observed followed by the array rejecting an ADISC request before the hosts go into an APD state:

RSCN Received:
nfnic: <2>: INFO: fnic_fdls_validate_and_get_frame_type: 3441: Received RSCN from target FCTL: 0x29 type: 0x1 s_id: 0xbe0540.

Array rejecting ADISC:

2020-12-31T08:23:14.158Z cpu0:2097904)nfnic: <2>: INFO: fdls_process_tgt_adisc_rsp: 1916: ADISC returned FC_LS_REJ from target: 0xbe0540
2020-12-31T08:23:14.158Z cpu29:2097897)nfnic: <1>: INFO: fdls_process_tgt_adisc_rsp: 1916: ADISC returned FC_LS_REJ from target: 0xbf05e0
2020-12-31T08:23:14.158Z cpu0:2097904)nfnic: <2>: INFO: fdls_process_tgt_adisc_rsp: 1916: ADISC returned FC_LS_REJ from target: 0xbe0560


APD:
2020-12-31T08:23:14.425Z cpu25:2097623)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1844: Could not select path for device "eui.095f4daec18b1ac96c9ce900a073fbda".
2020-12-31T08:23:14.425Z cpu16:2097934)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "eui.095f4daec18b1ac96c9ce900a073fbda" - issuing command 0x459b2c48e140
2020-12-31T08:23:14.425Z cpu16:2097934)WARNING: vmw_psp_rr: psp_rrSelectPath:2177: Could not select path for device "eui.095f4daec18b1ac96c9ce900a073fbda".
2020-12-31T08:23:14.425Z cpu16:2097934)WARNING: NMP: nmpDeviceAttemptFailover:715: Retry world failover device "eui.095f4daec18b1ac96c9ce900a073fbda" - failed to issue command due to Not found (APD), try again...


Environment

VMware vSphere ESXi 6.7

Cause

Nimble arrays will start a Unit Attention timer (40 seconds by default) when it associates a LUN to an initiator group. The array is expecting the initiators to be sending I/O during this time period so it can fail that I/O with a CHECK CONDITION, UNIT ATTENTION so the initiators is aware that there has been a change in LUN presentation and then to pick up the change by replying with REPORT_LUNS. If this timer expires, it will assume that the initiators that did not send any I/O during the time period are having a fabric issue and then sends an RSCN to those initiators to get them to logout from the fabric and then back in.

The Cisco Native FNIC (nfnic) driver is not performing a full LOGO and instead attempts to try to proceed with an ADISC which is rejected by the Nimble array since the array has already logged out the affected initiators.

Resolution

This issue only affects NFNIC driver 4.0.0.62 & 4.0.0.63. Cisco has released an updated NFNIC driver (4.0.0.65) that addresses this behavior:

https://customerconnect.vmware.com/downloads/details?downloadGroup=DT-ESXI67-CISCO-NFNIC-40065&productId=742

The issue is being tracked under: CSCvv43938: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvv43938


Workaround:
As a workaround for this issue, Nimble and VMware recommends changing the Unit Attention (UA) timer on the Nimble array from 40 seconds to 300. This will ensure that the ESXi hosts are always sending I/O within that time period so the array can respond with a Unit Attention and the ESXi hosts will pick up the change.