Customer using TAP with TKGs underlying found that they are unable to create new workloads in TAP iterate cluster.
Trouble-shooting steps as below:
Step 1. Run "tanzu apps workload get -n <dev-namespace> <workload_name>" to get the current workload status.
Step 2. Once identified the stuck pod, run "kubectl describe pod <workload_name>-build-xxx-build-pod -n <dev-namespace>" to get the current pod status.
Name: tanzu-java-web-app-build-1-build-pod
Namespace: cnd
Priority: 0
Service Account: default
Node: iterate1-1-md-2-########-qx7tt/10.##.147.###
Start Time: Mon, 23 Oct 2023 08:31:27 +0000
... ...
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage.
IP: 100.##.2.###
IPs:
IP: 100.##.2.###
Controlled By: Build/tanzu-java-web-app-build-1
Init Containers:
setup-ca-certs:
Container ID: containerd://1df111abded768ee531865b96625f3eb65effa18f70a76dcd056f222f87971e3
Image: harbordev1.uat.###.com/tap-packages/tap-packages@sha256:f839f262cf638aa4994d8afeeb1a7654770cd3d3ed811ed88e661e3424f3287f
Image ID: harbordev1.uat.###.com/tap-packages/tap-packages@sha256:f839f262cf638aa4994d8afeeb1a7654770cd3d3ed811ed88e661e3424f3287f
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 23 Oct 2023 08:31:32 +0000
Finished: Mon, 23 Oct 2023 08:31:33 +0000
Ready: True
Restart Count: 0
... ...
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
From the above o/p, we can see the pod was in Failed status due to it has been evicted as the node (iterate1-1-md-2-#########-qx7tt/10.xx.147.xxx) was low on resource: ephemeral-storage.
Step 3. SSH to the node and run "df -h" to check the disk size.
According to the TAP doc - Resource requirements, To deploy Tanzu Application Platform packages build, run and iterate (shared) profile, your cluster must have at least: 100 GB of disk space available per node.
There are two options to fix this issue: