High Write Latency and Transaction Failures Pods using NFS Storage

search cancel

High Write Latency and Transaction Failures Pods using NFS Storage

book

Article ID: 431300

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid VMware vSphere ESXi

Issue/Introduction

Applications running on Tanzu Kubernetes Grid (TKG) experience severe write latency peak workload periods
Temporary resolution following a pod restart.

Environment

TKG 2.4.1

ESXi 8.0 U3

Cause

NFS client is negotiating a 64KB block size (rsize=65536, wsize=65536).
Log in to the TKG worker node.

Check the Mount:

mount | grep nfs | grep <NFS IP address>
The output shows your NFS client is negotiating a 64KB block size (rsize=65536, wsize=65536).
In a TKGm environment (likely running on vSphere/NSX), transferring data in small 64KB chunks creates massive overhead.
To transfer just 1GB of data, the client and server have to exchange roughly 16,000 requests with this configuration..

Resolution

Increasing the NFS block size to 1MB (1048576) reduces IOPS overhead and improves throughput for high-workload applications.

Back up the PV configuration file

kubectl get pv <pv-name> -o yaml > <pv-name>-backup.yaml
Edit the PV

kubectl edit pv <pv-name>

Add the below mountOptions to the spec

mountOptions:
- hard
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- noatime

Note: The mountOptions: should be aligned vertically with capacity: and nfs

Sample file after editing

apiVersion: v1
kind: PersistentVolume
metadata:
  name: <PV Name>
  ...
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 100Gi
  mountOptions:           # <--- NEW SECTION
  - hard
  - nfsvers=4.1
  - rsize=1048576
  - wsize=1048576
  - noatime
  nfs:
    path: /#######/<PV Name>
    server: <NFS IP adress>
...

Save and Exit (Press Esc, type :wq, and hit Enter)
Recreate the Pod

kubectl delete pod <pod-name> -n <namespace>
SSH into the worker node and run to confirm the block size is changed.
mount | grep <NFS IP Address>

Output should show rsize=1048576 and wsize=1048576 as mentioned in step 3.

Feedback

thumb_up Yes

thumb_down No