NFS connectivity issues on NexentaStor NFS filers on ESXi 5.x/6.0

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
When using NFS datastores on some NexentaStor NFS filer models on an ESXi host, you experience these symptoms:

The NFS datastores appear to be unavailable (grayed out) in vCenter Server or when accessed through the vSphere Client.
The NFS shares reappear after few minutes.
Virtual machines located on the NFS datastore are in a hung/paused state when the NFS datastore is unavailable.
This issue is most often seen after a host upgrade to ESXi 5.x or the addition of an ESXi 5.x host to the environment.
In the /var/log/vmkernel.log file on the ESXi host, you see entries similar to:

NFSLock: 515: Stop accessing fd 0xc21eba0 4
NFS: 283: Lost connection to the server 192.168.100.1 mount point /vol/datastore01,
 mounted as bf7ce3db-42c081a2-0000-000000000000 ("datastore01")
NFSLock: 477: Start accessing fd 0xc21eba0 again
NFS: 292: Restored connection to the server 192.168.100.1 mount point /vol/datastore01,
 mounted as bf7ce3db-42c081a2-0000-000000000000 ("datastore01")

<YYYY-MM-DD>T<time> Z cpu2:8194)StorageApdHandler: 277: APD Timer killed for ident [b63367a0-e78ee62a]
<YYYY-MM-DD>T<time> Z cpu2:8194)StorageApdHandler: 402: Device or filesystem with ID [b63367a0-e78ee62a] exited All Paths Down state.
<YYYY-MM-DD>T<time> Z cpu2:8194)StorageApdHandler: 902: APD Exit for ident [b63367a0-e78ee62a]!
<YYYY-MM-DD>T<time> Z cpu6:8208)NFSLock: 570: Start accessing fd 0x4100108487f8 again
<YYYY-MM-DD>T<time> Z cpu2:8194)WARNING: NFS: 322: Lost connection to the server 10.20.90.2 mount point /vol/nfs_snapmirror_test,
 mounted as bd5763b1-19271ed7-0000-000000000000 ("AFO_SNAPMIRROR_TEST")
<YYYY-MM-DD>T<time> Z cpu2:8194)WARNING: NFS: 322: Lost connection to the server 10.20.90.2 mount point /vol/nfs_vmware_isos_vol01,
mounted as 654dc625-6010e4e6-0000-000000000000 ("NFS_SATA_ISOS_VOL01")</time></time></time></time></time></time>

In the /var/log/vobd.log file on the ESXi host, you see entries similar to:

<YYYY-MM-DD>T<time> Z: [vmfsCorrelator] 6084893035396us: [esx.problem.vmfs.nfs.server.disconnect]
 192.168.100.1 /vol/datastore01 bf7ce3db-42c081a2-0000-000000000000 volume-name:datastore01
<YYYY-MM-DD>T<time> Z: [vmfsCorrelator] 6085187880809us: [esx.problem.vmfs.nfs.server.restored]
 192.168.100.1 /vol/datastore01 bf7ce3db-42c081a2-0000-000000000000 volume-name:datastore01</time></time>

When examining a packet trace from the VMkernel port used for NFS, zero window TCP segments may be seen originating from the NFS filer in Wireshark:

No Time Source Destination Protocol Length Info 784095 325.356980 10.1.1.35 10.1.1.26 RPC 574 [TCP ZeroWindow] Continuation 792130 325.452001 10.1.1.35 10.1.1.26 TCP 1514 [TCP ZeroWindow] [TCP segment of a reassembled PDU]

Host may disconnect in the environment.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 6.0
VMware vSphere ESXi 5.5

Resolution

This is a known issue affecting vSphere 5.5 Update 1. For more information on vSphere 5.5 U1, see Intermittent NFS APDs on VMware ESXi 5.5 U1 (2076392).

To work around this issue, use one of these options:

If sufficiently licensed, you can utilize the Storage I/O Control feature to work around the issue. An Enterprise Plus license for all ESXi hosts is required to use this feature.

When Storage I/O Control is enabled, it dynamically sets the value of MaxQueueDepth, circumventing the issue.

For more information on Storage I/O control, see:

Reduce the NFS.MaxQueueDepth advanced parameter to a lower value. This has proven to reduce or eliminate the disconnections.

To set the NFS.MaxQueueDepth advanced parameter using the vSphere Client:
1. Click the host in the Hosts and Clusters view.
2. Click the Configuration tab, then click Advanced Settings under Software.
3. Click NFS, then scroll down to NFS.MaxQueueDepth.
4. Change the value to 64.
5. Click OK.
6. Reboot the host for the change to take effect.
To set the NFS.MaxQueueDepth advanced parameter using the vSphere 5.1 Web Client:
1. Click the Hosts and Clusters tab.
2. Click the ESXi host you want to modify.
3. Navigate to Manage > Settings > Advanced System Settings.
4. Select the variable NFS.MaxQueueDepth.
5. Change the value to 64 and click OK.
6. Reboot the host for the change to take effect.
To set the NFS.MaxQueueDepth advanced parameter on the command line:
1. Connect to the host using SSH. For more information, see Using ESXi Shell in ESXi 5.x and 6.0 (2004746).
2. Run the command:
  
  # esxcfg-advcfg -s 64 /NFS/MaxQueueDepth
3. Reboot the host for the change to take effect.
4. After the host has rebooted, confirm the change by running the command:
  
  # esxcfg-advcfg -g /NFS/MaxQueueDepth
  Value of MaxQueueDepth is 64
Note: VMware suggests a value of 64. If this is not sufficient to stop the disconnects, you may need to further reduce the value by half. For example, change the value to 32 or 16 accordingly until the disconnects cease.

Additional Information

Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts
Configuring Flow Control on VMware ESXi and VMware ESX
Using ESXi Shell in ESXi 5.x and 6.x
Intermittent NFS APDs on VMware ESXi 5.5 U1
ESXi 5.x/6.0 中 NexentaStor NFS 文件管理器上的 NFS 连接问题
ESXi 5.x/6.0 における NexentaStor NFS ファイラ上の NFS 接続の問題

NFS connectivity issues on NexentaStor NFS filers on ESXi 5.x/6.0

Article ID: 337964

Updated On:

Products

Issue/Introduction

Environment

Resolution

Additional Information

Feedback