vSphere Kubernetes Supervisor Root Disk Space Full at 100%
search cancel

vSphere Kubernetes Supervisor Root Disk Space Full at 100%

book

Article ID: 383369

calendar_today

Updated On:

Products

VMware vSphere with Tanzu Tanzu Kubernetes Runtime VMware vSphere 7.0 with Tanzu vSphere with Tanzu

Issue/Introduction

Root disk usage has reached 100% on one or more Supervisor Cluster Control Plane VM in a vSphere Kubernetes Supervisor environment, leading to running out of disk space in root and diskpressure issues.

 

While SSH to a Supervisor Control Plane VM, the root disk space is 100%:

  • See "How to SSH into Supervisor Control Plane VMs" in Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VMs
    • The floating IP address output by the decryptK8Pwd python script may not be reachable due to disk space issues.
    • Use the IP address directly assigned to a Supervisor Control Plane VM instead of the floating IP address.
  • root@4201a23b34567890c10de1112fg134 [ ~ ]# df -h

    Filesystem Size Used Avail Use% Mounted on
    /dev/root ##G ##G ##G 100% /

Many system processes will fail and continue to crash while any Supervisor Control Plane VM is at full root disk usage.

  • This includes the service which assigns the floating IP address to one of the Supervisor Control Plane VMs.

Environment

vSphere 8.0 with Tanzu

vSphere 7.0 with Tanzu

This issue can occur regardless of whether or not the environment is managed by Tanzu Mission Control (TMC)

Cause

Disk usage on the cluster is due to a variety of reasons.

Log Accumulation: /var/log

ETCD Snapshots and Data

Container/Pod Logs: /var/log/pods

Leftover unused images and replicasets built up over time from previous Supervisor cluster upgrades

Resolution

If the root disk space in a Supervisor control plane VM reaches 100%, multiple system critical services will fail.

VMware by Broadcom Engineering is aware of the issue and is working on fixes to be included in an upcoming patch for the below known issues:

  • Failed log rotation of /var/log/vmware/upgrade-ctl-cli.log* files leading to multiple 1GB files appended with an additional number
  • Unused images built-up overtime and leftover from multiple Supervisor cluster upgrades
  • Unused replicasets built-up overtime and leftover from multiple Supervisor cluster upgrades
  • Further reducing disk space usage populated by system journal logging and other logging system services

Please reach out to VMware by Broadcom Technical Support referencing this KB article for assistance in cleaning up Supervisor disk space.

WARNING: Deleting files without Support's advice can lead to further issues in or potential irrecoverable destruction of the environment.