Troubleshooting NFS datastore connectivity issues

Troubleshooting NFS datastore connectivity issues

book

Article ID: 323107

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • The NFS share cannot be mounted by the ESXi host.
  • The NFS share is mounted, but nothing can be written to it.
  • The NFS datastore is inaccessible.
  • You see entries similar to:

    NFS Error: Unable to connect to NFS server

    WARNING: NFS: 983: Connect failed for client 0xb613340 sock 184683088: I/O error

    WARNING: NFS: 898: RPC error 12 (RPC failed) trying to get port for Mount Program (100005) Version (3) Protocol (TCP) on Server (xxx.xxx.xxx.xxx)

    Network cable is unplugged


Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Resolution

To resolve this issue, validate that these steps are true for your VMware environment:

Caution: Do not skip a step. The steps provide instructions or a link to a document for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution.
  1. Check the MTU size configuration on the port group which is designated as the NFS VMkernel port group. If it is set to anything other than 1500 or 9000, test the connectivity using the vmkping command:

    # vmkping -I vmkN -s nnnn xxx.xxx.xxx.xxx

    Where:
     
    • vmkN is vmk0, vmk1, etc, depending on which vmknic is assigned to NFS.
      nnnn is the MTU size minus 28 bytes for overhead. For example, for an MTU size of 9000, use 8972.
    • xxx.xxx.xxx.xxx is the IP address of the target NFS storage.

    To reveal the vmknics, run the command:

    esxcfg-vmknic -l

    Check the output for the vmk_ interface associated with NFS.
     
  2. Verify connectivity to the NFS server and ensure that it is accessible through the firewalls. For more information, see Cannot connect to NFS network share (broadcom.com)
  3. Run the netcat (nc) command to see if you can reach the NFS server nfsd TCP/UDP port (default 2049) on the storage array from the host:

    # nc -vz array-IP 2049

    Example output:
    If the NFS server is reachable, you will get the following output.

    # nc -vz 10.0.0.4 2049
    Connection to 10.0.0.4 2049 port [tcp/nfs] succeeded!

    Here is the failure example.
    # nc -vz 10.0.0.4 2049
    nc: connect to 10.0.0.4 port 2049 (tcp) failed: Connection timed out
     
  4. Verify that the ESXi host can vmkping the NFS server. For more information, see Testing VMkernel network connectivity with the vmkping command (broadcom.com)
  5. Verify that the NFS host can ping the VMkernel IP of the ESXi host.
  6. Verify that the virtual switch being used for storage is configured correctly. 

    Note: Ensure that there are enough available ports on the virtual switch. 
     
  7. Verify that the storage array is listed in the VMware Hardware Compatibility Guide. For more information, see the VMware Compatibility Guide. Consult your hardware vendor to ensure that the array is configured properly.

    Note: Some array vendors have a minimum microcode/firmware version that is required to work with the ESXi host.
     
  8. Verify that the physical hardware functions correctly. Consult your hardware vendor for more details.
  9. If this is a Windows server, verify that it is correctly configured for NFS. For more information, see Troubleshooting the failed process of adding a datastore from a Windows Services NFS device (broadcom.com).
To troubleshoot a mount being read-only:
  1. Verify that the permissions of the NFS server have not been set to read-only for this ESXi host.
  2. Verify that the NFS share was not mounted with the read-only box selected.

In addition please see: Remounting a disconnected NFS datastore from the ESXi command line (broadcom.com)


If the above troubleshooting has not resolved the issue and there are still locked files, e.g. attempting to unmount the NAS volume may fail with an error similar to:

WARNING: NFS: 1797: <NFS UUID> has open files, cannot be unmounted

To troubleshoot the lock:
  1. Identify the ESXi host holding the lock. For more information, see Investigating virtual machine file locks on ESXi hosts (broadcom.com)
  2. Restart the management agents on the host. For more information, see Restarting the Management agents in ESXi (broadcom.com)
  3. If the lock remains, a host reboot is required to break the lock.

    Note: If you wish to investigate the cause of the locking issue further, ensure to capture the host logs before rebooting.


Additional Information