File opens are stuck on NFS mount point in vSAN File Service
search cancel

File opens are stuck on NFS mount point in vSAN File Service

book

Article ID: 326680

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • File opens will appear to be stuck on the NFSv4.1 mount forever.

  • There won't be any error message on the terminal.

Environment

VMware vSAN 7.x

Cause

  • The issue is caused as the NFS v4.1 client does not send a RECLAIM_COMPLETE to the NFS server in some rare cases after server failover.

  • This is a required operation for NFS server as stated in NFSv4.1 RFC 5661. The server may get stuck in the grace period for that particular client which means all the IO or locking operations will be stuck forever.

  • This is not a vSAN File Service issue as the Linux NFS client is failing to send the RECLAIM_COMPLETE to the NFS server residing on the vSAN File Service container.

Resolution

To confirm that the File Open is in a hung state run a packet capture from the NFS client OS as follows:

(1) Install tcpdump if it is not installed.

  • Install tcpdump on CentOS client by running yum install tcpdump
  • Install tcpdump on Ubuntu client by running apt install tcpdump

(2) Find the network interface to be run with tcpdump.

tcpdump -D

Command output is a list of all available network interfaces that tcpdump can collect packets from, pick the first interface.

(3) Enable rpcdebug

sysctl -w sunrpc.nfs_debug=1023
rpcdebug -s all -m nfs
rpcdebug -s all -m rpc


(4) Run tcpdump command to capture packets to/from each File Service container IP

tcpdump -C 1024 -W 5 -G 60000 -s 350 -i <network interface> host <one container IP> -w <packet capture file>
For example: tcpdump -C 1024 -W 5 -G 60000 -s 350 -i ens32 host 10.x.x.53 -w /sdb/tcpdump_container_10.x.x.53.pcap

(5) Stop packets capturing, disable rpcdebug

killall tcpdump
rpcdebug -c all -m nfs
rpcdebug -c all -m rpc


Note: Ensure there is enough disk space for the packet capture files. Without enough space, the packet capture file will be rolled over. Also, depending on container IPs configured for File Service, a set of tcpdump commands each with one container IP should be run.

Things to notice in the packet capture.

  1. The NFSv4 OPEN operations would keep failing with NFS4ERR_GRACE continuously until the client is rebooted.
  2. If you go back to the first time it started seeing NFS4ERR_GRACE, you will notice that client has missed sending RECLAIM_COMPLETE op after creating a new NFS session with the server.

Workaround:

Workaround is to unmount and mount the file share on client.

Additional Information