TAP workload deploy failed due to pod was evicted with message "The node was low on resource: ephemeral-storage"

search cancel

TAP workload deploy failed due to pod was evicted with message "The node was low on resource: ephemeral-storage"

book

Article ID: 297911

calendar_today

Updated On: 07-30-2024

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Customer using TAP v1.6.2 with TKGs underlying found that they are unable to create new workloads in TAP iterate cluster.

Trouble-shooting steps as below:
Step 1. Run "tanzu apps workload get -n <dev-namespace> <workload_name>" to get the current workload status.

Step 2. Once identified the stuck pod, run "kubectl describe pod <workload_name>-build-xxx-build-pod -n <dev-namespace>" to get the current pod status.

Name:             tanzu-java-web-app-build-1-build-pod
Namespace:        cnd
Priority:         0
Service Account:  default
Node:             iterate1-1-md-2-########-qx7tt/10.##.147.###
Start Time:       Mon, 23 Oct 2023 08:31:27 +0000
... ...
Status:           Failed
Reason:           Evicted
Message:          The node was low on resource: ephemeral-storage. 
IP:               100.##.2.###
IPs:
  IP:           100.##.2.###
Controlled By:  Build/tanzu-java-web-app-build-1
Init Containers:
  setup-ca-certs:
    Container ID:   containerd://1df111abded768ee531865b96625f3eb65effa18f70a76dcd056f222f87971e3
    Image:          harbordev1.uat.###.com/tap-packages/tap-packages@sha256:f839f262cf638aa4994d8afeeb1a7654770cd3d3ed811ed88e661e3424f3287f
    Image ID:       harbordev1.uat.###.com/tap-packages/tap-packages@sha256:f839f262cf638aa4994d8afeeb1a7654770cd3d3ed811ed88e661e3424f3287f
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 23 Oct 2023 08:31:32 +0000
      Finished:     Mon, 23 Oct 2023 08:31:33 +0000
    Ready:          True
    Restart Count:  0
... ...
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True

From the above o/p, we can see the pod was in Failed status due to it has been evicted as the node (iterate1-1-md-2-#########-qx7tt/10.xx.147.xxx) was low on resource: ephemeral-storage.

Step 3. SSH to the node and run "df -h" to check the disk size.

According to the TAP doc - Resource requirements, To deploy Tanzu Application Platform packages build, run and iterate (shared) profile, your cluster must have at least: 100 GB of disk space available per node.

Environment

Product Version: 1.6

Resolution

There are two options to fix this issue:

Increase the disk space on this TKC cluster, then delete the workload and redeploy the workload again.
Create a new TKC cluster with 100 GB of disk space available per node, then deploy the workload.

Feedback

thumb_up Yes

thumb_down No