For all the management clusters that were created in TKG 1.6 and before, and upgraded to TKG 2.1(or then continued to upgrade to 2.2/2.3), but failed to create new workload clusters:
Status: Conditions: Last Transition Time: 2023-08-22T13:32:38Z Message: Scaling up control plane to 3 replicas (actual 1) Reason: ScalingUp Severity: Warning Status: False Type: Ready Last Transition Time: 2023-08-22T13:34:28Z Status: True Type: Available
Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- … KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
"kapp": { "refName": "kapp-controller.tanzu.vmware.com.0.41.7+vmware.1-tkg.1" },
E0822 13:32:35.420973 1 clusterbootstrapclone.go:789] ClusterBootstrapController "msg"="failed to getGVR" "error"="unable to retrieve the complete list of server APIs: controlplane.antrea.tanzu.vmware.com/v1beta1: the server is currently unable to handle the request, controlplane.antrea.tanzu.vmware.com/v1beta2: the server is currently unable to handle the request, stats.antrea.tanzu.vmware.com/v1alpha1: the server is currently unable to handle the request, system.antrea.tanzu.vmware.com/v1beta1: the server is currently unable to handle the request"
E0823 00:59:15.953455 1 available_controller.go:524] v1beta1.system.antrea.tanzu.vmware.com failed with: failing or missing response from https://100.67.143.140:443/apis/system.antrea.tanzu.vmware.com/v1beta1: bad status from https://100.67.143.140:443/apis/system.antrea.tanzu.vmware.com/v1beta1: 404
Antrea ClusterResourceSet “<management-cluster-name>-antrea” is restoring the deprecated APIService on the management cluster, which can prevent the tanzu-addons-controller-manager from bootstrapping the kapp-controller and other addons like CNI to the workload cluster.
ClusterResourceSet is no longer used on management clusters after TKG 2.1. In the future releases they will be cleaned up.
The issue is fixed in 2.4.0
kubectl get clusterresourceset <management-cluster-name>-antrea -n tkg-system -oyaml > antrea-crs.yaml kubectl get secret <management-cluster-name>-antrea-crs -n tkg-system -oyaml > antrea-crs-secret.yaml kubectl delete clusterresourceset <management-cluster-name>-antrea -n tkg-system kubectl delete secret <management-cluster-name>-antrea-crs -n tkg-system
kubectl detele apiservice v1beta1.networking.antrea.tanzu.vmware.com kubectl detele apiservice v1beta1.controlplane.antrea.tanzu.vmware.com kubectl detele apiservice v1alpha1.stats.antrea.tanzu.vmware.com kubectl detele apiservice v1beta1.system.antrea.tanzu.vmware.com kubectl detele apiservice v1beta2.controlplane.antrea.tanzu.vmware.com
This workaround is a one time task. After the steps above are completed, the problem will no longer exist in the future TKG upgrades.
To avoid encountering this problem, before upgrading the management cluster from TKG 1.6 to TKG 2.1, delete the ClusterResourceSet related as below if their status is failing.
kubectl get clusterresourceset <management-cluster-name>-antrea -n tkg-system -oyaml > antrea-crs.yaml kubectl get secret <management-cluster-name>-antrea-crs -n tkg-system -oyaml > antrea-crs-secret.yaml kubectl delete clusterresourceset <management-cluster-name>-antrea -n tkg-system kubectl delete secret <management-cluster-name>-antrea-crs -n tkg-system
If TKG is created from 1.6 or before and already upgraded to TKG 2.1 or TKG 2.2 or TKG 2.3, do the workaround steps above before the next upgrade to avoid this issue.