This is resolved in TKr v1.23.8---vmware.3-tkg.1
Workaround:
There are two options when a Worker Node is stuck in this state:
- Delete the Worker Node. This will automatically trigger a new node rollout and the containerd mount will be reset.
- SSH connect directly into the Worker Node and restart containerd service.
To apply the workaround:
----------------------------------------------------
Identify the Problem Node:1. Use the
kubectl vsphere login command from your jumpbox to connect to the TKGS Guest Cluster. The following documentation will help with this:
Kubectl vSphere Login2. Identify the pods stuck in ContainerCreating state using:
# kubectl get pods -A -o wide | grep ContainerCreating3. Note the Worker Node on which the pods stuck in ContainerCreating are being scheduled
----------------------------------------------------
Option 1 Delete Problem Node:
1. Use the
kubectl vsphere login command from your jumpbox to connect to the TKGS Guest Cluster.
2. List the nodes using:
# kubectl get nodes3. Delete the problem node identified above:
# kubectl delete node <nodename>----------------------------------------------------Option 2 Restart Containerd on Problem Node:
1. Use the following command to identify the machine and wcpmachine that back the problem node. Gather the IP address associated
- # kubectl get machine,wcpmachine -n <TKGS_CLUSTER_NAMESPACE> | grep <CLUSTER_NAME>
- Identify the machine that has the same name as the problem node. Note the ProviderID
- Find the wcpmachine that has the same ProviderID as the machine associated with the problem node. Note the IP address.
- Use the following documentation to connect via SSH into the problem node IP identified above: Log Into TKGS Guest Cluster
2. Once logged into the Problem Node, enable root privilege:
# sudo su3. Confirm there are no pods running on the node:
# crictl psExample output:
# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID-----------4. If there are no containers running, restart containerd:
# systemctl restart containerd
- NOTE: If there are containers running and containerd is restarted, the containers will also be restarted. Please don't restart containerd if you see containers running. If any containers are running, the environment is experiencing a different problem.
5. Wait a few moments, then check to see if pods are starting:
# crictl ps