When an upgrade is initiated on a VKS cluster with a single control plane node, to the VKR version "v1.34.1---vmware.1-vkr.4" or later, the upgrade will fail or get stuck if the clusters have Kubernetes policies applied via VKS cluster management.
VKS cluster management installs Gatekeeper on workload clusters to enforce policy. The gatekeeper webhook interferes with the VKS workload cluster upgrade on a single-node control plane cluster.
VCF Automation 9.0.1
VKS cluster upgrade can be unblocked by configuring the Gatekeeper webhook installed on the workload cluster.
Edit gatekeeper-validating-webhook-configuration webhook and change "failurePolicy" value to "Ignore" instead of "Fail", or add "flow-aggregator" namespace to the relevant namespaceSelector.
Use the command below to edit the webhook:
kubectl edit validatingwebhookconfigurations gatekeeper-validating-webhook-configuration
Implement one of the following options to resolve the issue:
(Option 1) Search for "failurePolicy". You will find 2 occurrences
Ignore" and the other one is set to "Fail"Fail" to "Ignore"(Option 2) For the 2nd instance of webhook containing failurePolicy: Fail, add another namespace "flow-aggregator" to the existing namespaceSelector list, as shown below
namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: NotIn values: - gatekeeper-system - vmware-system-antrea - tkg-system - flow-aggregator
This issue specifically impacts single control-plane VKS clusters. Multi-control-plane clusters are not typically affected.