Error: Too many pods.
search cancel

Error: Too many pods.

book

Article ID: 426141

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • Pods are failing to start on the TCA Manager appliance or the TCA-CP appliance.
  • Checking the `kubectl get pods -n tca-mgr` output, you see a high count of cleanup jobs in either a Pending or Error state.
  • The logs directory on the appliance may show 100% disk utilization.

  • Describing one of the cleanup pods in an Error state shows the event:
    Warning  FailedScheduling  8m24s  default-scheduler  0/1 nodes are available: 1 Too many pods. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

  • Attempting to deploy workloads may fail with:
    "Grant request failed: Error loading Kubernetes API resources: Unauthorized"

Environment

3.2

Cause

Kubelet has a default limit of 110 pods that it will schedule on a single node. If this limit is reached, pods will stop being scheduled. This may also cause the logs directory on the appliance to fill up. 

Resolution

Clean Up The Excess Jobs:

  1. Identify which pods are in excess:
    • kubectl get pods -n namespace

      Example: kubectl get pods -n tca-mgr

      Note:
      You can identify which jobs these are based on their name. If this is occurring on the TCA Control Plane appliance, update the namespace accordingly. 

  2. Identify which jobs are in excess:

    • kubectl get jobs -n tca-mgr

  3. Clean up the excess jobs:
    • kubectl delete job -n tca-mgr <JOB_NAME>
    • Alternatively, if the count is too high to cleanup individually, you can use the following command to target the lot of them:
    • for job in `kubectl get jobs -n tca-mgr | grep -E <JOB_NAME> | awk '{print $1}'`; do kubectl delete job -n tca-mgr $job; done;


Free Up Space:

  1. Log in to the TCA Manager or TCA-CP appliance via SSH as admin and switch to root.

  2. Verify the disk usage using df -h to confirm the partition (typically /dev/mapper/vg_logs-lv_logs) is at 100%.

  3. Identify large files consuming space (e.g., find / -type f -size +100M) or checking log rotation directories.

  4. Clear unnecessary files (old log bundles, core dumps, or rotated logs).

  5. Once space is reclaimed, reboot the appliance to restore the Kubernetes API functionality.