High Write Latency and Transaction Failures Pods using NFS Storage
book
Article ID: 431300
calendar_today
Updated On:
Products
VMware Tanzu Kubernetes Grid
VMware vSphere ESXi
Issue/Introduction
- Applications running on Tanzu Kubernetes Grid (TKG) experience severe write latency peak workload periods
- Temporary resolution following a pod restart.
Cause
- NFS client is negotiating a 64KB block size (rsize=65536, wsize=65536).
- Log in to the TKG worker node.
Check the Mount:
mount | grep nfs | grep <NFS IP address>
- The output shows your NFS client is negotiating a 64KB block size (rsize=65536, wsize=65536).
- In a TKGm environment (likely running on vSphere/NSX), transferring data in small 64KB chunks creates massive overhead.
- To transfer just 1GB of data, the client and server have to exchange roughly 16,000 requests with this configuration..
Resolution
Increasing the NFS block size to 1MB (1048576) reduces IOPS overhead and improves throughput for high-workload applications.
- Back up the PV configuration file
kubectl get pv <pv-name> -o yaml > <pv-name>-backup.yaml
- Edit the PV
kubectl edit pv <pv-name>
- Add the below mountOptions to the spec
mountOptions:
- hard
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- noatime
Note: The mountOptions: should be aligned vertically with capacity: and nfs
Sample file after editing
apiVersion: v1
kind: PersistentVolume
metadata:
name: <PV Name>
...
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 100Gi
mountOptions: # <--- NEW SECTION
- hard
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- noatime
nfs:
path: /#######/<PV Name>
server: <NFS IP adress>
...
- Save and Exit (Press Esc, type :wq, and hit Enter)
- Recreate the Pod
kubectl delete pod <pod-name> -n <namespace> SSH into the worker node and run to confirm the block size is changed.
mount | grep <NFS IP Address>
Output should show rsize=1048576 and wsize=1048576 as mentioned in step 3.
Feedback
thumb_up
Yes
thumb_down
No