VKS Cluster status False after transitioning from Single-Zone to Multi-Zone topology

search cancel

VKS Cluster status False after transitioning from Single-Zone to Multi-Zone topology

book

Article ID: 439746

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When a VMware Kubernetes Service (VKS) Cluster is upgraded to VKS 3.6 on VCF 9.1 and the Cluster Namespace is transitioned from a Single-Zone to a Multi-Zone setup, the Cluster status may become False with a TopologyReconciled error. This occurs if the cluster configuration uses a StorageClass with Immediate binding mode without defined failure domains.

The following error is observed in the Cluster status or via kubectl describe cluster: error reconciling the Cluster topology: failed to create patch helper for Cluster <REDACTED_NAME>: server side apply dry-run failed for modified object: admission webhook "capi.validating.tanzukubernetescluster.run.tanzu.vmware.com" denied the request: spec.topology.workers.machineDeployments[0].variables.volumes[0].storageClass: Invalid value: "<REDACTED_STORAGE_CLASS>": StorageClass "<REDACTED_STORAGE_CLASS>" uses "Immediate" binding mode; WaitForFirstConsumer (latebinding suffix) binding mode is required in multi-zone environment if no failureDomain is specified

Environment

VMware Cloud Foundation (VCF) 9.1
VMware Kubernetes Service (VKS) 3.6

Cause

VKS 3.6 enforces a validation rule where clusters in a multi-zone environment must use StorageClass objects with WaitForFirstConsumer binding mode unless an explicit failureDomain is defined for the Node Pool. This ensures that volumes are provisioned in the same availability zone where the pod is scheduled.

Resolution

To restore cluster health, update the Cluster specification using one of the following methods. Note that both options will trigger a rolling update of the affected node pools.

Option 1: Explicitly Define Failure Domains Modify the Cluster YAML to define a specific failureDomain for each MachineDeployment (Node Pool). This allows the use of Immediate binding as the zone is deterministic.

Option 2: Use a WaitForFirstConsumer (WFFC) StorageClass Update the storageClass reference in the Cluster specification to a class that supports WaitForFirstConsumer binding. These are typically identified by a -latebinding suffix in VCF environments.

Additional Information

VCF-GS Domain/SME Definitions

Feedback

thumb_up Yes

thumb_down No