Intermittent NFS APDs on VMware ESXi 5.5 U1
search cancel

Intermittent NFS APDs on VMware ESXi 5.5 U1

book

Article ID: 305227

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article describes a specific issue. If you experience all the symptoms described in the Symptoms section, see the sections listed herein. If you experience some but not all of these symptoms, your issue is not related to this article. Remember to evaluate the storage to ensure that NFS storage is configured according to the VMware best practice and the network connections between hosts and SAN are not routed.



Symptoms:
When running ESXi 5.5 Update 1, the ESXi host Intermittently loses connectivity to NFS storage and an All Paths Down (APD) condition to NFS volumes is observed. You experience these symptoms:
  • Intermittent APDs for NFS datastores are reported, with consequent potential blue screen errors for Windows virtual machine guests and read-only file systems in Linux virtual machines.

    Note: NFS volumes include VSA datastores.

  • For the duration of the APD condition and after, the array still responds to ping and netcat tests are also successful, and there is no evidence to indicate a physical network or a NFS storage array issue.
  • The NFS storage array logs and traces do not indicate an issue.
  • Hosts that are not running ESXi 5.5 U1 continue to work and can read and write to the NFS share.
  • The vobd.log files (located at /var/log/) contain entries similar to:

    Note: These log entries use a volume named 12345678-abcdefg0 as an example:

    YYYY-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
    YYYY-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
    YYYY-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
    YYYY-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
    YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
    YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.


Environment

VMware vSphere ESXi 5.5

Resolution

This issue is resolved in ESXi 5.5 Express Patch 04. For more information, see VMware ESXi 5.5, Patch ESXi550-201406401-SG: Updates esx-base (2077360).

If you are unable to upgrade, VMware recommends using ESXi 5.5 GA with all appropriate security patches. For more information on patching ESXi 5.5 GA for the Heartbleed vulnerability, see Resolving OpenSSL Heartbleed for ESXi 5.5 - CVE-2014-0160 (2076665).

Note: Use the esxcli software vib list command to list all applied patches on a host.

Additional Information


Restarting the Management agents in ESXi
Virtual machines stop responding when any LUN on the host is in an all-paths-down (APD) condition
Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
Resolving OpenSSL Heartbleed for ESXi 5.5 - CVE-2014-0160
VMware ESXi 5.5 U1 での断続的な NFS APD
VMware ESXi 5.5, Patch ESXi550-201406401-SG: Updates esx-base
VMware ESXi 5.5 U1 上间歇性出现 NFS APD