After initiating a workload cluster upgrade from VKR v1.33.x to v1.34.X, the upgrade is stuck because the workload cluster's antrea-controller is failing to start.
While connected to the affected workload cluster context, one or more of the following symptoms are observed:
"Container runtime network not ready"
networkReady="NetworkReady=false reason:NetworkPluginNotReady
message:Network plugin returns error: cni plugin not initialized"
kubectl get pods -n kube-system
Describing the antrea-controller pod shows the following image pull error message:
kubectl describe pod -n kube-system <antrea-controller-pod>
Back-off pulling image \\\"localhost:5000/tkg/packages/core/antrea@sha256:<sha>\\\": ErrImagePull: rpc error: code = NotFound desc = failed to pull and unpack image \\\"localhost:5000/tkg/packages/core/antrea@sha256:<sha>\\\": failed to resolve reference \\\"localhost:5000/tkg/packages/core/antrea@sha256:<sha>\\\": localhost:5000/tkg/packages/core/antrea@sha256:<sha>: not found\
Couldn't get configMap kube-system/antrea-config-ver-#: configmap "antrea-config-ver-#" not found.
kubectl get pods -n kube-system
kubectl get replicaset -n kube-system | grep antrea
NAME DESIRED CURRENT READY
antrea-controller-<id-a> 1 # #
antrea-controller-<id-b> 1 # #
vSphere Supervisor
VKR upgrade from v1.33.X to v1.34.X
Antrea-controller is unable to start properly due to a race condition on its start-up.
This issue has been observed more frequently in workload clusters using a single control plane node.
This issue will be resolved in a future VKS version.
The antrea-controller deployment will need to be manually recreated by the system.
kubectl get deployment -n kube-system antrea-controller -o yaml > antrea-controller-deploy-backup.yaml
kubectl delete deployment -n kube-system antrea-controller
kubectl patch pkgi -n vmware-system-tkg cluster-antrea --type='merge' -p '{"spec":{"syncPeriod":"9m"}}'
kubectl get deployment -n kube-system
kubectl get pods -n kube-system
If the antrea-controller deployment needs to be reverted, the below command can be used to revert it:
kubectl apply -f antrea-controller-deploy-backup.yaml