Continuous storage path loss/recovery occurs after a fabric event when connected to a Pure storage array running Purity 6.9.4 or older

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0 VMware vSphere ESX 8.x

Issue/Introduction

A VCF Administrator observes constant and persistent storage path losses following a fabric event when ESXi hosts are connected to a PURE storage array. Upon further inspection, the /var/log/vmkernel.log shows repeated PLOGI rejections from all PURE array targets:

2026-01-28T03:22:43.212Z In(182) vmkernel: cpu73:2098208)nfnic: <1>: INFO: fdls_process_tgt_abts_rsp: 3434: PLOGI BA_RJT received for tport_fcid: 0x10e00 OX_ID: 0x2001 with reason code: 0x9 reason code explanation: 0x0
2026-01-28T03:22:43.213Z In(182) vmkernel: cpu73:2098208)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1372: send tgt PLOGI: tgt: 0x10e00 OXID: 0x2001
2026-01-28T03:22:45.547Z In(182) vmkernel: cpu12:2098215)nfnic: <2>: INFO: fdls_process_tgt_abts_rsp: 3434: PLOGI BA_RJT received for tport_fcid: 0x10ec0 OX_ID: 0x20bb with reason code: 0x9 reason code explanation: 0x0
2026-01-28T03:22:45.547Z In(182) vmkernel: cpu12:2098215)nfnic: <2>: INFO: fdls_tgt_send_plogi: 1372: send tgt PLOGI: tgt: 0x10ec0 OXID: 0x20bb
2026-01-28T03:22:51.672Z In(182) vmkernel: cpu12:2098215)nfnic: <2>: INFO: fdls_process_tgt_abts_rsp: 3434: PLOGI BA_RJT received for tport_fcid: 0x10ea0 OX_ID: 0x2048 with reason code: 0x9 reason code explanation: 0x0
2026-01-28T03:22:51.673Z In(182) vmkernel: cpu12:2098215)nfnic: <2>: INFO: fdls_tgt_send_plogi: 1372: send tgt PLOGI: tgt: 0x10ea0 OXID: 0x2048

Also, Out of Order frames are observed during the PRLI and PLOGI operations:

2026-01-28T01:43:47.741Z In(182) vmkernel: cpu24:2098215)nfnic: <2>: INFO: fdls_process_tgt_prli_rsp: 1826: Out of order PRLI Rsp. tport id:10ec0 state:1, flags:0, oxid:2204Restarting IT nexus
2026-01-28T01:45:14.391Z In(182) vmkernel: cpu4:2098215)nfnic: <2>: INFO: fdls_process_tgt_prli_rsp: 1826: Out of order PRLI Rsp. tport id:10ec0 state:1, flags:0, oxid:2204Restarting IT nexus
2026-01-28T01:46:40.857Z In(182) vmkernel: cpu4:2098215)nfnic: <2>: INFO: fdls_process_tgt_prli_rsp: 1826: Out of order PRLI Rsp. tport id:10ec0 state:1, flags:0, oxid:2204Restarting IT nexus
2026-01-28T01:48:07.419Z In(182) vmkernel: cpu4:2098215)nfnic: <2>: INFO: fdls_process_tgt_prli_rsp: 1826: Out of order PRLI Rsp. tport id:10ec0 state:1, flags:0, oxid:2204Restarting IT nexus
2026-01-28T01:49:34.000Z In(182) vmkernel: cpu4:2098215)nfnic: <2>: INFO: fdls_process_tgt_prli_rsp: 1826: Out of order PRLI Rsp. tport id:10ec0 state:1, flags:0, oxid:2204Restarting IT nexus
2026-01-28T03:06:09.225Z In(182) vmkernel: cpu98:2098208)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1686: Out of order PLOGI Rsp. tport id:10dc0 state:2, flags:0 oxid:2002Restarting IT nexus
2026-01-28T03:06:09.229Z In(182) vmkernel: cpu98:2098208)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1686: Out of order PLOGI Rsp. tport id:10c22 state:2, flags:0 oxid:205bRestarting IT nexus
2026-01-28T03:06:09.237Z In(182) vmkernel: cpu98:2098208)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1686: Out of order PLOGI Rsp. tport id:10e00 state:3, flags:0 oxid:205eRestarting IT nexus

Unknown FCoE frames may also be reported:

2026-01-28T05:07:19.099Z In(182) vmkernel: cpu76:2098208)nfnic: <1>: INFO: fnic_fdls_recv_frame: 4503: Received unknown FCoE frame of len: 176. Dropping frame
2026-01-28T05:07:20.099Z In(182) vmkernel: cpu76:2098208)nfnic: <1>: INFO: fdls_process_logo_req: 3649: Received LOGO req from 0xfffc01 in iport state:3 .Dropping the frame.
2026-01-28T05:07:37.116Z In(182) vmkernel: cpu76:2098208)nfnic: <1>: INFO: fnic_fdls_recv_frame: 4503: Received unknown FCoE frame of len: 176. Dropping frame
2026-01-28T05:07:38.116Z In(182) vmkernel: cpu76:2098208)nfnic: <1>: INFO: fdls_process_logo_req: 3649: Received LOGO req from 0xfffc01 in iport state:3 .Dropping the frame.

Environment

ESXi (All versions)
Pure Storage running firmware 6.9.4 or earlier

Cause

This issue is observed when there are an excessive number of initiators zoned to the array with active session. In this example, there were over 10000 active sessions to the array. This effectively means that an RSCN issued from the array to the connected initiators due to fabric events will result in all of those initiators reconnecting to the array simultaneously, and leading to a resource exhaustion situation on the Pure array.

Resolution

Pure is working on a firmware fix to be introduced in Purity 6.9.5 that will address this resource issue. For now, the recommendation is reduce the fabric size as well as ensure hosts that arent presented LUNs from the array to be removed from the zone, which also reduces the zone size.