Temporary All Paths Down (APD) occurs when there is a partial power outage in the environment
search cancel

Temporary All Paths Down (APD) occurs when there is a partial power outage in the environment

book

Article ID: 406074

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 7.x VMware vSphere ESX 8.x VMware vSphere ESXi 8.0

Issue/Introduction

A VCF Administrator observes an APD state occurring in the environment but it only last a short period of time (15 minutes)

Environment

ESXi (all versions)

 

Cause

A Registered State Change Notification (RSCN) is sent to all devices in the zone immediately before the APD event began:

2025-07-14T23:21:11.252Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3888: RSCN payload_len: 0x800 page_len: 0x4
2025-07-14T23:21:11.252Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xb00860
2025-07-14T23:21:11.252Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xb00880
2025-07-14T23:21:11.252Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xb008a0
2025-07-14T23:21:11.253Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xb008c0
2025-07-14T23:21:11.253Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3907: RSCN received for num_ports: 0 payload_len: 2048 page_len: 4
2025-07-14T23:21:11.258Z cpu40:2098053)nfnic: <2>: INFO: fdls_process_rscn: 3888: RSCN payload_len: 0x800 page_len: 0x4
2025-07-14T23:21:11.258Z cpu40:2098053)nfnic: <2>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xca0820
2025-07-14T23:21:11.258Z cpu40:2098053)nfnic: <2>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xca0840
2025-07-14T23:21:11.258Z cpu40:2098053)nfnic: <2>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xca0860
2025-07-14T23:21:11.258Z cpu40:2098053)nfnic: <2>: INFO: fdls_process_rscn: 3897: RSCN for port id: 0xca0880

Next we observe that the Storage array targets that were logged into the fabric are no longer there:

2025-07-14T23:21:11.253Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xb008a0 not found in GPN_FT list
2025-07-14T23:21:11.254Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xb008c0 not found in GPN_FT list
2025-07-14T23:21:11.260Z cpu32:2098053)nfnic: <2>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xca0840 not found in GPN_FT list
2025-07-14T23:21:11.260Z cpu47:2098053)nfnic: <2>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xca0880 not found in GPN_FT list
2025-07-14T23:21:11.282Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xb00860 not found in GPN_FT list
2025-07-14T23:21:11.282Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xb00880 not found in GPN_FT list
2025-07-14T23:21:11.305Z cpu47:2098053)nfnic: <2>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xca0820 not found in GPN_FT list
2025-07-14T23:21:11.305Z cpu47:2098053)nfnic: <2>: INFO: fdls_process_gpn_ft_tgt_list: 2460: Remove port: 0xca0860 not found in GPN_FT list

140 seconds later, the APD Timeout timer kicks in to fast fail all outstanding I/O to the LUNs:

2025-07-14T23:23:31.320Z: [APDCorrelator] 5387651025839us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.320Z: [APDCorrelator] 5387470962405us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.328Z: [APDCorrelator] 5387651033809us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.328Z: [APDCorrelator] 5387470970235us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.346Z: [APDCorrelator] 5387651051818us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.346Z: [APDCorrelator] 5387470988263us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.363Z: [APDCorrelator] 5387651068884us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-07-14T23:23:31.363Z: [APDCorrelator] 5387471005363us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.################] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

15 minutes later, another RSCN is received and the array target ports are back in the Get Port Name (GPN_FT) list:

RSCN:

2025-07-14T23:36:45.612Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3888: RSCN payload_len: 0x800 page_len: 0x4
2025-07-14T23:36:45.612Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_rscn: 3907: RSCN received for num_ports: 0 payload_len: 2048 page_len: 4

Port Login (PLOGI) is successful to every array target:

2025-07-14T23:36:45.615Z cpu62:2098046)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xb008c0 OXID: 0x2000
2025-07-14T23:36:45.615Z cpu62:2098046)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xb008c0
2025-07-14T23:36:46.854Z cpu47:2098053)nfnic: <2>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xca0840 OXID: 0x2000
2025-07-14T23:36:46.854Z cpu47:2098053)nfnic: <2>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xca0880 OXID: 0x2001
2025-07-14T23:36:46.855Z cpu47:2098053)nfnic: <2>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xca0880
2025-07-14T23:36:47.124Z cpu47:2098053)nfnic: <2>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xca0840
2025-07-14T23:36:49.143Z cpu32:2098046)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xb00860 OXID: 0x2000
2025-07-14T23:36:49.143Z cpu32:2098046)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xb00880 OXID: 0x2001
2025-07-14T23:36:49.143Z cpu32:2098046)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xb00880
2025-07-14T23:36:49.143Z cpu32:2098046)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xb00860
2025-07-14T23:36:49.238Z cpu47:2098053)nfnic: <2>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xca0820 OXID: 0x2000
2025-07-14T23:36:49.238Z cpu47:2098053)nfnic: <2>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xca0820
2025-07-15T23:46:42.511Z cpu34:2098053)nfnic: <2>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xca0860 OXID: 0x2000
2025-07-15T23:46:42.512Z cpu34:2098053)nfnic: <2>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xca0860
2025-07-15T23:46:48.116Z cpu40:2098046)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1353: send tgt PLOGI: tgt: 0xb008a0 OXID: 0x2000
2025-07-15T23:46:48.116Z cpu40:2098046)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1687: PLOGI accepted by target: 0xb008a0

The storage array ports are now back on the fabric and the ESXi host exits the APD state for all LUNs and restores storage paths:

2025-07-14T23:36:52.385Z: [APDCorrelator] 5388452115491us: [vob.storage.apd.exit] Device or filesystem with identifier [naa.################] has exited the All Paths Down state.
2025-07-14T23:36:52.385Z: [APDCorrelator] 5388272027156us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [naa.################] has exited the All Paths Down state.
2025-07-14T23:36:52.390Z: [scsiCorrelator] 5388452120873us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T6:L22 changed state from dead
2025-07-14T23:36:52.391Z: [scsiCorrelator] 5388452121377us: [vob.scsi.scsipath.por] Power-on Reset occurred on vmhba0:C0:T6:L40
2025-07-14T23:36:52.395Z: [APDCorrelator] 5388452125984us: [vob.storage.apd.exit] Device or filesystem with identifier [naa.################] has exited the All Paths Down state.
2025-07-14T23:36:52.395Z: [APDCorrelator] 5388272037663us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [naa.################] has exited the All Paths Down state.
2025-07-14T23:36:52.404Z: [scsiCorrelator] 5388452134303us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T6:L40 changed state from dead 

Resolution

In this instance, the datacenter suffered a power outage that affected the storage array but not the ESXi hosts or fabric switches. As such, the array went thru an unexpected power cycle, and the array took 15 minutes to come back online. This is what caused the array ports to drop from the fabric which resulted in a temporary APD state.