VKS Cluster upgrade to 3.6 or VCF 9.1 fails with "StorageClass uses Immediate binding mode" in Multi-Zone environments
search cancel

VKS Cluster upgrade to 3.6 or VCF 9.1 fails with "StorageClass uses Immediate binding mode" in Multi-Zone environments

book

Article ID: 439821

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When upgrading a VMware vSphere Kubernetes Service (VKS) cluster to version 3.6 (as part of a VCF 9.1 upgrade), the cluster reconciliation may fail if the environment has been transitioned from Single-Zone to Multi-Zone. The Cluster status reports TopologyReconciled: False.

Symptoms:

Running kubectl describe cluster <Cluster Name> returns the following:

 
Status:
  Conditions:
    Last Transition Time: 2026-01-08T17:16:34Z
    Message: error computing the desired state of the Cluster topology: failed to apply patches: failed to generate patches for patch "default": failed to call extension handler "generate-patches.runtime-extension": got failure response
    Reason: TopologyReconcileFailed
    Severity: Error
    Status: False
    Type: TopologyReconciled

The runtime-extension-controller-manager logs contain the following denial from the admission webhook: 2026-01-09T06:01:30 handler.go:129] "error during patch generation" err="unable to patch cluster with resolved KR data: admission webhook \"capi.validating.tanzukubernetescluster.run.tanzu.vmware.com\" denied the request: spec.topology.workers.machineDeployments[0].variables.volumes[0].storageClass: Invalid value: \"<StorageClass Name>\": StorageClass \"<StorageClass Name>\" uses \"Immediate\" binding mode; WaitForFirstConsumer (latebinding suffix) binding mode is required in multi-zone environment if no failureDomain is specified for MachineDeployment \"<Node Pool Name>\" at path spec.topology.workers.machineDeployments[0]"

Environment

  • VMware Cloud Foundation 9.1

  • VMware vSphere Kubernetes Service (VKS) 3.6

  • Supervisor configured with Multi-Zone support

Cause

Starting with VKS 3.6, enhanced validation is enforced via the Runtime Extension during cluster updates/upgrades. In a Multi-Zone setup, StorageClasses must use WaitForFirstConsumer (WFFC) binding mode to ensure volumes are provisioned in the correct availability zone. If a cluster was originally created in a Single-Zone environment using "Immediate" binding and subsequently moved to a Multi-Zone environment, the upgrade-triggered patch will fail this validation.

Resolution

To resolve this issue, the Cluster specification must be updated to meet Multi-Zone requirements. Note that these changes will trigger a rolling update of the node pools.

  1. Identify affected clusters: Check for clusters with TopologyReconciled: False following the VKS 3.6 upgrade.

  2. Modify Cluster Specification:

    • Option 1 (Recommended): Update the storageClass for the affected Node Pools to use a version with the -latebinding suffix (which uses WaitForFirstConsumer binding).

    • Option 2: Explicitly define a failureDomain within the machineDeployments section of the Cluster spec for each node pool.

  3. Apply Changes: Save the updated Cluster configuration. The Runtime Extension will successfully generate patches, and the cluster will proceed with a rolling update to reconcile the new topology.

Additional Information

This validation is intended to prevent persistent volume binding failures that occur when an "Immediate" volume is provisioned in a zone that does not match the node where the pod is eventually scheduled.