When a VMware Kubernetes Service (VKS) Cluster is upgraded to VKS 3.6 on VCF 9.1 and the Cluster Namespace is transitioned from a Single-Zone to a Multi-Zone setup, the Cluster status may become False with a TopologyReconciled error. This occurs if the cluster configuration uses a StorageClass with Immediate binding mode without defined failure domains.
The following error is observed in the Cluster status or via kubectl describe cluster: error reconciling the Cluster topology: failed to create patch helper for Cluster <REDACTED_NAME>: server side apply dry-run failed for modified object: admission webhook "capi.validating.tanzukubernetescluster.run.tanzu.vmware.com" denied the request: spec.topology.workers.machineDeployments[0].variables.volumes[0].storageClass: Invalid value: "<REDACTED_STORAGE_CLASS>": StorageClass "<REDACTED_STORAGE_CLASS>" uses "Immediate" binding mode; WaitForFirstConsumer (latebinding suffix) binding mode is required in multi-zone environment if no failureDomain is specified
VMware Cloud Foundation (VCF) 9.1
VMware Kubernetes Service (VKS) 3.6
VKS 3.6 enforces a validation rule where clusters in a multi-zone environment must use StorageClass objects with WaitForFirstConsumer binding mode unless an explicit failureDomain is defined for the Node Pool. This ensures that volumes are provisioned in the same availability zone where the pod is scheduled.
To restore cluster health, update the Cluster specification using one of the following methods. Note that both options will trigger a rolling update of the affected node pools.
Option 1: Explicitly Define Failure Domains Modify the Cluster YAML to define a specific failureDomain for each MachineDeployment (Node Pool). This allows the use of Immediate binding as the zone is deterministic.
Option 2: Use a WaitForFirstConsumer (WFFC) StorageClass Update the storageClass reference in the Cluster specification to a class that supports WaitForFirstConsumer binding. These are typically identified by a -latebinding suffix in VCF environments.