High Write Latency and Transaction Failures Pods using NFS Storage
search cancel

High Write Latency and Transaction Failures Pods using NFS Storage

book

Article ID: 431300

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid VMware vSphere ESXi

Issue/Introduction

  • Applications running on Tanzu Kubernetes Grid (TKG) experience severe write latency peak workload periods
  • Temporary resolution following a pod restart.

     

     

Environment

TKG 2.4.1

ESXi 8.0 U3

Cause

  • NFS client is negotiating a 64KB block size (rsize=65536, wsize=65536).
  • Log in to the TKG worker node.

    Check the Mount:

    mount | grep nfs | grep <NFS IP address> 

  • The output shows your NFS client is negotiating a 64KB block size (rsize=65536, wsize=65536).
  • In a TKGm environment (likely running on vSphere/NSX), transferring data in small 64KB chunks creates massive overhead.
  • To transfer just 1GB of data, the client and server have to exchange roughly 16,000 requests with this configuration..

Resolution

Increasing the NFS block size to 1MB (1048576) reduces IOPS overhead and improves throughput for high-workload applications.

  1. Back up the PV configuration file 

    kubectl get pv <pv-name> -o yaml > <pv-name>-backup.yaml

  2. Edit the PV 

    kubectl edit pv <pv-name>

  3. Add the below mountOptions to the spec

    mountOptions:
      - hard
      - nfsvers=4.1
      - rsize=1048576
      - wsize=1048576
      - noatime

    Note: The mountOptions: should be aligned vertically with capacity: and nfs

    Sample file after editing

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: <PV Name>
      ...
    spec:
      accessModes:
      - ReadWriteMany
      capacity:
        storage: 100Gi
      mountOptions:           # <--- NEW SECTION
      - hard
      - nfsvers=4.1
      - rsize=1048576
      - wsize=1048576
      - noatime
      nfs:
        path: /#######/<PV Name>
        server: <NFS IP adress>
    ...
  4. Save and Exit (Press Esc, type :wq, and hit Enter)

  5. Recreate the Pod

    kubectl delete pod <pod-name> -n <namespace>
  6. SSH into the worker node and run to confirm  the block size is changed. 

     mount | grep <NFS IP Address>

    Output should show rsize=1048576 and wsize=1048576 as mentioned in step 3.