Customer using TAP v1.6.2 with TKGs underlying found that they are unable to create new workloads in TAP iterate cluster.
Trouble-shooting steps as below:
Step 1. Run "tanzu apps workload get -n <dev-namespace> <workload_name>" to get the current workload status.
Step 2. Once identified the stuck pod, run "kubectl describe pod <workload_name>-build-xxx-build-pod -n <dev-namespace>" to get the current pod status.
Name: tanzu-java-web-app-build-1-build-pod Namespace: cnd Priority: 0 Service Account: default Node: iterate1-1-md-2-########-qx7tt/10.##.147.### Start Time: Mon, 23 Oct 2023 08:31:27 +0000 ... ... Status: Failed Reason: Evicted Message: The node was low on resource: ephemeral-storage. IP: 100.##.2.### IPs: IP: 100.##.2.### Controlled By: Build/tanzu-java-web-app-build-1 Init Containers: setup-ca-certs: Container ID: containerd://1df111abded768ee531865b96625f3eb65effa18f70a76dcd056f222f87971e3 Image: harbordev1.uat.###.com/tap-packages/tap-packages@sha256:f839f262cf638aa4994d8afeeb1a7654770cd3d3ed811ed88e661e3424f3287f Image ID: harbordev1.uat.###.com/tap-packages/tap-packages@sha256:f839f262cf638aa4994d8afeeb1a7654770cd3d3ed811ed88e661e3424f3287f Port: <none> Host Port: <none> State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 23 Oct 2023 08:31:32 +0000 Finished: Mon, 23 Oct 2023 08:31:33 +0000 Ready: True Restart Count: 0 ... ... Conditions: Type Status Initialized False Ready False ContainersReady False PodScheduled True
From the above o/p, we can see the pod was in Failed status due to it has been evicted as the node (iterate1-1-md-2-#########-qx7tt/10.xx.147.xxx) was low on resource: ephemeral-storage.
Step 3. SSH to the node and run "df -h" to check the disk size.
According to the TAP doc - Resource requirements, To deploy Tanzu Application Platform packages build, run and iterate (shared) profile, your cluster must have at least: 100 GB of disk space available per node.
There are two options to fix this issue: