You upgrade a vSphere Kubernetes Service (VKS) guest cluster and the upgrade gets stuck updating control plane nodes.
Cluster will have following error:
Control plane components: Node xyz does not exist
You fetch all control plane machines and see that the machine xyz does exists and is in a running state.
kubectl get machine -n <NAMESPACE> | grep -v worker
Use kubeconfig of the cluster to look at the nodes:
export TKC=<YOUR GUEST CLUSTER NAME> NS=<CLUSTER'S NAMESPACE>
kubectl get secret -n ${NS} ${TKC}-kubeconfig -o jsonpath={.data.value} | base64 -d > ${TKC}-kubeconfig
KUBECONFIG=${TKC}-kubeconfig kubectl get nodes${KUBECONFIG} kubectl get nodes --show-labels| grep -v "worker"
You will see the node is Ready but is not marked as Control Plane. Also, this node will not have "node-role.kubernetes.io/control-plane=" label on it.
ssh into the control plane node (see the Additional Information section below) and tail /var/log/clou-init-output.log. You will see error similar to:
[####-##-## ##:##:##] [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
[####-##-## ##:##:##] I0913 14:56:41.226935 1926 kubelet.go:332] [kubelet-start] preserving the crisocket >information for the node
[####-##-## ##:##:##] I0913 14:56:41.227454 1926 patchnode.go:32] [patchnode] Uploading the CRI socket >"unix:///var/run/containerd/containerd.sock" to Node "cluster-name-xyz" as an annotation
[####-##-## ##:##:##] error execution phase kubelet-wait-bootstrap: error writing CRISocket for this node: Unauthorized
[####-##-## ##:##:##] To see the stack trace of this error execute with --v=5 or higher
[####-##-## ##:##:##] + set -xe
[####-##-## ##:##:##] + touch /root/kubeadm-complete
[####-##-## ##:##:##] + vmware-rpctool 'info-set guestinfo.kubeadm.phase complete'
This means that there was an error during cloud init and kubeadm join failed.
All Kubernetes clusters can have this issue
Please contact Broadcom support for assistance.