Kubernetes worker node rebooted with boot id error "NFS: Server wrote zero bytes"
search cancel

Kubernetes worker node rebooted with boot id error "NFS: Server wrote zero bytes"

book

Article ID: 431950

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • We've observed that the worker node for the application is getting rebooted, and we checked at the same time, it's generating a warning  "rebooted with boot id: ########-####-####-####-######."

  • vmkernel.log shows error: "kernel: NFS: Server wrote zero bytes, expected 65536."

  • vmware.log shows error "vcpu-0 - CPU reset: hard (mode Emulation)."

 

Environment

TKGm: 2.4.x

TKGm: 2.5.x

Cause

A protocol negotiation failure between the Ubuntu 20.04 kernel and the NetApp storage array, specifically involving NFS v4.1 Parallel NFS (pNFS) as it interacts with the Kubernetes Persistent Volume (PV).

Resolution

To resolve the issue, follow these numbered steps to migrate the Persistent Volume (PV) from NFS v4.1/4.2 to NFS v4.0. This process disables pNFS negotiation

  1. Scale Down Workloads: Identify all Deployments or StatefulSets using the PV and set their replicas to 0. Do not proceed until the pod list confirms no active pods are running for the workload

  2. Verify Reclaim Policy: Confirm the PV Reclaim Policy is set to Retain in the YAML before proceeding. If it is set to Delete, deleting the PV will permanently destroy the underlying data.

  3. Back Up Definitions: Export the current PersistentVolume and PersistentVolumeClaim to YAML files.
    kubectl get pv <pv-name> -o yaml > pv-backup.yaml kubectl get pvc <pvc-name> -n <namespace> -o yaml > pvc-backup.yaml

  4. Delete Resources: Delete the PVC first, then the PV
    kubectl delete pvc <pvc-name> -n <namespace>
    kubectl delete pv <pv-name>

  5. Edit the PV YAML: Open pv-backup.yaml and update the mountOptions section.
    Change vers=4.2 (or 4.1) to vers=4.0.
    Delete the following auto-generated fields: resourceVersion, uid, creationTimestamp, finalizers, and status

  6. Recreate the PV and PVC: Apply the edited PV file first and wait for the status to show Available. Once available, apply the PVC and confirm the status shows Bound
    kubectl apply -f pv-backup.yaml kubectl apply -f pvc-backup.yaml

  7. Scale Up Workloads: Restore the original replica counts for your Deployments or StatefulSets

  8. Verify Active Version: Once pods are running, confirm that the mount uses NFSv4.0 and that pNFS is inactive.
    Inside the pod:
    cat /proc/mounts | grep <nfs-server-ip> (Look for vers=4 and minorversion=0 
  9. dfsdfsdfsigrate the Perss: Identify all Deployments or StatefulSets using the PV and set their replicas to 0. Do not proceed until the pod list confirms no active pods are run