vCenter Reports "NFS All Paths Down"
search cancel

vCenter Reports "NFS All Paths Down"

book

Article ID: 404843

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms : 

  • On the vCenter Host Summary page, the following error is displayed: "NFS all paths down."


  • The affected NFS datastore appears as inaccessible and is not connected to the ESXi host in vCenter.

Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

The communication between the initiator (ESXi host) and the NFS target datastore failed due to an MTU mismatch on the physical network.

Cause validation:

  • var/run/log/VMkernel.log file report "Synchronous RPC cancel" events followed by "all paths down" errors:
    YYYY-MM-DDTHH:MM.SSSZ cpu74:32015542)SunRPC: 3291: Synchronous RPC cancel for client 0x431366c02730 IP ##.###.###.##.#.# proc 1 xid 0x26b0a255 attempt 2 of 3
    YYYY-MM-DDTHH:MM.SSSZ cpu50:32015579)SunRPC: 3291: Synchronous RPC cancel for client 0x431366c02730 IP ##.###.###.##.#.# proc 1 xid 0x26b0b239 attempt 1 of 3
    YYYY-MM-DDTHH:MM.SSSZ cpu8:2097742)StorageApdHandlerEv: 110: Device or filesystem with identifier [########-########] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu8:2097742)StorageApdHandlerEv: 126: Device or filesystem with identifier [########-########] has entered the All Paths Down Timeout state after being in the All Paths Down state for 500 seconds. I/Os will now be fast failed.

  • /var/run/log/vobd.log file report "All Paths Down" followed by "Lost connection to the target":
    YYYY-MM-DDTHH:MM.SSSZ: [APDCorrelator] 4789768901796us: [vob.storage.apd.start] Device or filesystem with identifier [########-########] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ: [APDCorrelator] 4789840782785us: [esx.problem.storage.apd.start] Device or filesystem with identifier [########-########] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ: [vmfsCorrelator] 4789992783521us: [vob.vmfs.nfs.server.disconnect] Lost connection to the server ##.###.###.## mount point /Datastorename_ ##.###.###.##, mounted as ########-########-####-############ ("Datastorename_ ##.###.###.##")

    Refer this KB for more information : NFS datastores enters All Paths Down condition with error 'SunRPC Synchronous RPC cancel for client'
  • Run the command "esxcfg-vswitch -l" to identify the vSwitch used for NFS traffic and check the MTU configured on it.
    esxcfg-vswitch -l

    DVS Name         Num Ports   Used Ports  Configured Ports  MTU
    Switch-name      ####        8           512               9000

    In the above example, it is confirmed that vmnic0 and vmnic1 are used for NFS communication. vSwicth is configured with 9000 MTU. 

  • Run the command "esxcfg-vmknic -l" to verify the MTU set on the VMkernel adapter (vmk)
    esxcfg-vmknic -l
    vmk4       15                                      IPv4      ##.###.###.###                          ###.###.###.#   ##.###.###.###  ##:##:##:##:##:##  9000    65535     true    STATIC  defaultTcpipStack

    In the above example, it is confirmed that vmk2 is configured with MTU 9000.

  • Run the command "esxcfg-nics -l" to confirm the MTU configured on the physical nics (vmnics).

    esxcfg-nics -l
    Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description
    vmnic0  ####:##:##.# nenic       Up   50000Mbps  Full   ##:##:##:##:##:## 9000   Cisco Systems Inc Cisco VIC Ethernet NIC
    vmnic1  ####:##:##.# nenic       Up   50000Mbps  Full   ##:##:##:##:##:## 9000   Cisco Systems Inc Cisco VIC Ethernet NIC

    In the above example, it is confirmed that vmnics are configured with MTU 9000.

  • VMKping to target IP with MTU 8972 fails with 100% packet loss.

    vmkping -I vmk# -d -s 8972 ##.###.###.##
    PING ##.###.###.## (##.###.###.##): 8972 data bytes

    --- ##.###.###.## ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss

  • VMKping to target IP with MTU 1472 works fine. 

    vmkping -I vmk# -d -s 1472 ##.###.###.##
    PING ##.###.###.## (##.###.###.##): 1472 data bytes
    1480 bytes from ##.###.###.##: icmp_seq=0 ttl=128 time=0.504 ms
    1480 bytes from ##.###.###.##: icmp_seq=1 ttl=128 time=0.453 ms
    1480 bytes from ##.###.###.##: icmp_seq=2 ttl=128 time=0.494 ms

    --- ##.###.###.## ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.453/0.484/0.504 ms

    Above ping results confirm that host and target communication only works with MTU 1500, and fails with MTU 9000 due to an MTU mismatch in the network path.

Resolution

  • Engage internal networking team to validate MTU configuration across all physical switches and network devices in the path to the NFS target.
  • Ensure MTU is constituently configured across the network.