A workload cluster upgrade is stuck upgrading to KR v1.31.1.
While connected to the Supervisor cluster context, one or more of the following symptoms are observed:
kubectl get machine -n <workload cluster namespace>
<workload cluster namespace> machine.cluster.x-k8s.io/<new node name> <workload cluster> vsphere://<vsphere id> Running 10m <KR v1.31.1In this scenario, the workload cluster's worker nodepools have not upgraded to the desired version yet because the workload cluster's control plane nodes are not all healthy.
kubectl get machine -n <workload cluster namespace>
<workload cluster namespace> machine.cluster.x-k8s.io/<new node name> <workload cluster> vsphere://<vsphere id> Provisioned ##m <KR v1.31.1
kubectl describe cluster -n <workload cluster namespace> <workload cluster name>
* NodeHealthy:
* Node.Ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
status:conditions:
lastTransitionTime: "YYYY-MM-DDTHH:MM:SSZ"message: |-
kapp: Error: update customresourcedefinition/tiers.crd.antrea.io (apiextensions.k8s.io/v1) cluster:
Updating resource customresourcedefinition/tiers.crd.antrea.io (apiextensions.k8s.io/v1) cluster:
API server says:
CustomResourceDefinition.apiextensions.k8s.io "tiers.crd.antrea.io" is invalid: status.storedVersions[0]:
Invalid value: "v1alpha1": must appear in spec.versions (reason: Invalid)
While connected to the affected workload cluster context, the following symptoms are observed:
kubectl get pods -A | grep antrea
NAMESPACE NAME READY STATUS
kube-system antrea-agent-<id-1> 0/2 Init:ErrImagePull
kube-system antrea-agent-<id-2> 0/2 Init:ImagePullBackOff
kube-system antrea-agent-<id-3> 1/2 Running
kube-system antrea-controller-<id> 0/1 CrashLoopBackOff
kubectl logs -n kube-system <antrea-controller-pod>
Starting Antrea Controller (version v1.15.1-ea6613a)
Error running controller: failed to clean up the deprecated APIServices: apiservices.apiregistration.k8s.io "v1beta1.networking.antrea.tanzu.vmware.com" is forbidden: User "system:serviceaccount:kube-system:antrea-controller" cannot delete resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
vSphere Supervisor 8.0
vSphere Supervisor 9.0
VKS Service 3.2.0 and higher
Workload Cluster upgrading to KR v1.31.1
In vSphere Supervisor, KR v1.31.1 includes Antrea version 2.1 which retires the following advanced Antrea CRDs from earlier versions.
This issue can occur if any of these advanced API's were being used in the workload cluster prior to being upgraded to KR v1.31.1:
CRD | CRD version | Introduced In | Deprecated In | Removed in |
ClusterGroup | v1alpha2 | v1.0.0 | v1.1.0 | v2.0.0 |
ClusterGroup | v1alpha3 | v1.1.0 | v1.13.0 | v2.0.0 |
ClusterNetworkPolicy | v1alpha1 | v1.0.0 | v1.13.0 | v2.0.0 |
Egress | v1alpha2 | v1.0.0 | v1.13.0 | v2.0.0 |
ExternalEntity | v1alpha1 | v0.10.0 | v0.11.0 | v2.0.0 |
ExternalIPPool | v1alpha2 | v1.8.0 | v1.13.0 | v2.0.0 |
Group | v1alpha3 | v1.8.0 | v1.13.0 | v2.0.0 |
NetworkPolicy | v1alpha1 | v1.0.0 | v1.13.0 | v2.0.0 |
Tier | v1alpha1 | v1.0.0 | v1.13.0 | v2.0.0 |
Traceflow | v1alpha1 | v1.0.0 | v1.13.0 | v2.0.0 |
kubectl get pods -A | grep antrea-pre
kubectl get pods,jobs -A | grep antrea-pre
kubectl logs -n vmware-system-antrea <antrea-pre-upgrade-pod name>
Failed antrea-pre-upgrade pods can be cleaned up without issue.
kubectl describe job -n vmware-system-antrea <antrea-pre-upgrade-job job>
kubectl get app -n vmware-system-tkg | grep antrea
kubectl describe app -n vmware-system-tkg <workload cluster name>-antrea
usefulErrorMessage: |-
kapp: Error: waiting on reconcile job/antrea-pre-upgrade-job (batch/v1) namespace: vmware-system-antrea:
Finished unsuccessfully (Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit)
kubectl get pods -A | grep antrea
kubectl cp <antrea-agent-pod>:/usr/local/bin/antctl antctl -n kube-system
If the above command does not work, antctl CLI can be downloaded from the links in Additional Information.
ls -ltr
chmod 555 antctl
kubectl get pkgi -A | grep antrea
kubectl patch pkgi <workload cluster name>-antrea -n <antrea namespace> --type merge -p '{"spec":{"paused": true}}'
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io crdvalidator.antrea.io -o yaml > antrea-vwhc-backup.yaml
kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io crdmutator.antrea.io -o yaml > antrea-mwhc-backup.yaml
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io crdvalidator.antrea.io
kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io crdmutator.antrea.io
./antctl upgrade api-storage --dry-run
./antctl upgrade api-storage
kubectl patch pkgi <workload cluster name>-antrea -n <antrea namespace> --type merge -p '{"spec":{"paused": false}}'
kubectl get job -n vmware-system-antrea | grep antrea
kubectl get job -n vmware-system-antrea antrea-pre-upgrade-job -o yaml > antrea-pre-upgrade-job-backup.yaml
kubectl delete job -n vmware-system-antrea antrea-pre-upgrade-job
kubectl patch app <workload cluster name>-antrea -n <antrea namespace> --type='merge' -p '{"spec":{"syncPeriod":"9m"}}'
The above command harmlessly changes the syncPeriod of the antrea application which causes an immediate reconciliation because a change was made to the app.
If multiple reconciliations are needed, this value can be changed back and forth between 9m and 10m.
kubectl get jobs,pods -n vmware-system-antrea
kubectl get pods -A | grep antrea
kubectl get app,pkgi -A | grep antrea
There is an Antrea cli tool called antctl which will migrate the objects from the old CRD's to the new CRD's.
Alternatively, it can be downloaded at the bottom of the following page under Assets: https://github.com/antrea-io/antrea/releases/tag/v2.1.0
---------