Root disk usage has reached 100% on one or more Supervisor Cluster Control Plane VM in a vSphere Kubernetes Supervisor environment, leading to running out of disk space in root and diskpressure issues.
While SSH to a Supervisor Control Plane VM, the root disk space is 100%:
root@4201a23b34567890c10de1112fg134 [ ~ ]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root ##G ##G ##G 100% /
Many system processes will fail and continue to crash while any Supervisor Control Plane VM is at full root disk usage.
vSphere 8.0 with Tanzu
vSphere 7.0 with Tanzu
This issue can occur regardless of whether or not the environment is managed by Tanzu Mission Control (TMC)
Disk usage on the cluster is due to a variety of reasons.
Log Accumulation: /var/log
ETCD Snapshots and Data
Container/Pod Logs: /var/log/pods
Leftover unused images and replicasets built up over time from previous Supervisor cluster upgrades
If the root disk space in a Supervisor control plane VM reaches 100%, multiple system critical services will fail.
VMware by Broadcom Engineering is aware of the issue and is working on fixes to be included in an upcoming patch for the below known issues:
Please reach out to VMware by Broadcom Technical Support referencing this KB article for assistance in cleaning up Supervisor disk space.
WARNING: Deleting files without Support's advice can lead to further issues in or potential irrecoverable destruction of the environment.