When attempting to revert or downgrade core components in a vSphere Kubernetes Service (VKS) Guest Cluster, the PackageInstall (e.g., tkgs-npr-main-kapp-controller) fails to reconcile with the following error:
Stopped installing matched version '0.50.0+vmware.1-tkg.3-vmware' since last attempted version '0.55.0+vmware.1-fips-tkg.1' is higher.
hint: Add annotation packaging.carvel.dev/downgradable: "" to PackageInstall to proceed with downgradeAfter applying the required packaging.carvel.dev/downgradable annotation to bypass the version check, the reconciliation remains stuck, eventually timing out with:
kapp: Error: Timed out waiting after 30s for resources: apiservice/v1alpha1.data.packaging.carvel.dev (apiregistration.k8s.io/v1) cluster
deployment/kapp-controller (apps/v1) namespace: tkg-system
In this state, the kapp-controller deployment in the Guest Cluster shows 0/1 replicas available, and no pods are present in the tkg-system namespace.
VMware vSphere Kubernetes Service
The reconciliation deadlock is caused by a two-stage failure:
Version Protection: Carvel kapp-controller prevents downgrades by default to avoid accidental state corruption, requiring a manual annotation.
Admission Controller Blockade: Once the downgrade is forced, the kapp-controller deployment attempts to rotate pods. However, a Kyverno ClusterPolicy (e.g., check-image-registry) intended to restrict image pulls to corporate registries contains a syntax error in its exclusion list. The pattern tkg-system-* fails to match the literal tkg-system namespace.
Consequently, the Kyverno webhook denies the creation of the new kapp-controller pods because they attempt to pull images from the internal local registry (localhost:5000). Without running pods, the v1alpha1.data.packaging.carvel.dev APIService cannot initialize, leading to the 30-second timeout.
1. Enable Downgrade (Supervisor Context)
Annotate the PackageInstall on the Supervisor Cluster to permit the version rollback
2. Correct Kyverno Policy on the guest cluster
kubectl edit clusterpolicy <policy-name>
For webhooks that prevent image pulls and pod creation based on given namespaces, allow the following namespaces that are integral to VKS cluster lifecycle events:
kube-system
vmware-system-antrea
vmware-system-auth
vmware-system-cloud-provider
vmware-system-csi
tkg-system
secretgen-controller
vmware-system-supervisor-services
The namespace that houses VKS components which is unique to each environment and can be retrieved with:
kubectl get ns | grep svc-tkg
3. Reset Deployment State (Guest Cluster)
kubectl rollout restart deployment <deployment-name> -n <namespace>