Guest Cluster Upgrade Stalls with Control Plane Node stuck in "Provisioning" State

search cancel

book

calendar_today

Tanzu Kubernetes Runtime

When performing a rolling upgrade of a Tanzu Kubernetes Grid (TKG) Guest Cluster the process may stall after partially completing.
One of the Control Plane node remains in a Provisioning or Pending state indefinitely.
The virtual machine for the new node is powered on, but logging into the guest OS and running crictl ps -a returns an empty list, indicating the container runtime has not started any system pods.
kubectl get machine -n <namespace> shows the Machine object for the third node is not progressing.

2.3.x, 2.4.x, 2.5.x

The management cluster control plane has insufficient disk space.

Identify the stuck machine name in the Guest Cluster namespace (on the Management Cluster):

kubectl get machine -n <guest-cluster-namespace>
Delete the stuck Machine object:

kubectl delete machine <stuck-machine-name> -n <guest-cluster-namespace>
Monitor the recreation process. The Cluster API controller will detect the missing replica (count dropping from 3 to 2) and automatically provision a new VM.

You can also vacuum the Journal logs:

Review Disk usage:

journalctl --disk-usage

Vacuum the Journal logs:

journalctl --vacuum-size=100M

thumb_up Yes

thumb_down No