High Disk Utilization on TCA CP Root Partition Due to Stale Container Images
search cancel

High Disk Utilization on TCA CP Root Partition Due to Stale Container Images

book

Article ID: 422397

calendar_today

Updated On:

Products

VMware Telco Cloud Platform VMware Telco Cloud Automation

Issue/Introduction

  • The TCA Control Plane (TCA-CP) appliance elevated disk utilization on the root partition (/), exceeding 85%.

  • When inspecting the filesystem using df -h, the root filesystem is observed to be nearing capacity:
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/vg_system-lv_system 118G 97G 15G 87% /
  • High usage may also be correlated with overlay filesystems or containerd task directories, indicating an accumulation of container artifacts:
    overlay 118G 97G 15G 87% /run/containerd/io.containerd.runtime.v2.task/

Environment

  • TCA 3.2
  • TCP 5.0.2

Cause

  1. This issue is caused by the failure of the tca-deploy cleanup script to remove stale container images following an upgrade (e.g., from 3.0 to 3.2).
  2. The cleanup script fails due to an authentication mismatch involving Kubernetes credentials:
    1. Certificate Rotation: Upon startup in TCA 3.2, internal certificates (including Control Plane certificates) may be rotated to address expiry issues.

    2. Config Mismatch: The rotation process updates the system configuration file at /etc/kubernetes/admin.conf. However, it may fail to synchronize these changes to the root user's default kubeconfig at /root/.kube/config.

    3. Script Failure: The tca-deploy cleanup mechanism relies on kubectl commands that reference /root/.kube/config by default. Because this file contains expired or invalid credentials, the cleanup commands fail, leaving stale images on the disk which consume available space.

Resolution

Resolved in TCA 3.3.0.1



To resolve this issue, the root user's kubeconfig must be manually updated with the valid credentials from admin.conf, and the deployment pod must be restarted to trigger the cleanup logic.

Procedure:

  1. SSH into the TCA Control Plane appliance as the admin user and switch to root:

    su -
    
  2. Verify the current disk usage to establish a baseline:

    df -h /
    
  3. Back up the existing (invalid) kubeconfig file:

    cp /root/.kube/config /root/.kube/config.bak
    
  4. Copy the valid administrative configuration to the root user's directory:

    cp /etc/kubernetes/admin.conf /root/.kube/config
    
  5. Restart the tca-deploy pod. This action forces the pod to re-initialize and execute the image pruning script using the corrected credentials:

    # Identify the namespace (typically tca-system or tca-mgr depending on appliance type)
    kubectl delete pod -n tca-system -l app=tca-deploy
    
  6. Monitor the new pod status until it is Running:

    kubectl get pods -n tca-system -l app=tca-deploy -w
    
  7. Verify that disk space has been reclaimed:

    df -h /


    Resolved in TCA 3.3.0.1

Additional Information

This issue is a secondary symptom related to the internal certificate rotation behavior documented in the following Knowledge Base article. For details on the underlying certificate expiry issue and kubelet remediation, refer to: