The affected Supervisor Control Plane VM throws the following error in the vSphere UI:
"Node is not healthy and is not accepting pods. Details: Kubelet stopped posting node status."
Additionally, when running the command kubectl get nodes, the affected node shows a Not Ready status. Due to this, pods cannot run on the affected node.
vSphere with Tanzu 7.x
vSphere with Tanzu 8.x
The /root partition of the Control Plane VM is full, preventing the proper operation of the Kubelet and affecting the VM's ability to accept pods.
To resolve the issue, follow these steps:
1. SSH into the Affected Control Plane VM:
Follow the instructions in the "How to SSH into Supervisor Control Plane VMs" section of the Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VMs.
Use the root username and the password mentioned in the KB to access the affected VM.
2. Check Disk Space Usage:
SSH into the affected VM and run the following command to check disk usage:
df -h
This will show the disk usage of all partitions, and you will likely find that the /root partition is 100% full.
3. Clean Up Space on the Root Partition:
Follow the steps outlined in the KB article: Supervisor Cluster Unstable After Upgrade to clean up disk space.
Additionally, download and run the following Python scripts to clean up stale resources:
4. Check Other Control Plane Nodes:
Repeat the cleanup process on other Control Plane nodes as well to ensure they are not facing the same issue.
vSphere Kubernetes Supervisor Root Disk Space Full at 100%: https://knowledge.broadcom.com/external/article/383369