Guest Cluster Upgrade to 1.31.11 Fails with "secretgen-controller failed" Due to Gatekeeper Policies
search cancel

Guest Cluster Upgrade to 1.31.11 Fails with "secretgen-controller failed" Due to Gatekeeper Policies

book

Article ID: 437802

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When attempting to upgrade a vSphere with Tanzu (VKS) guest cluster from version 1.30.1 to 1.31.11, the upgrade process becomes stuck or fails.

The following symptoms are observed:

  • The cluster's AddonsReconciled condition is set to False.
  • Describing the cluster or checking the upgrade status reveals a timeout related to the secretgen-controller addon:

error: clusters.cluster.x-k8s.io "cluster-name" could not be patched: admission webhook denied the request: upgrade cannot be initiated as cluster's AddonsReconciled condition is not True. Message: Addon Secretgen-Controller is not ready: kapp: Error: Timed out waiting after 30s for resources. Reason: ReconcileFailed

  • Checking the deployments shows the secretgen-controller is not in a Ready status.
  • Events in the secretgen-controller namespace show that the Gatekeeper admission webhook is explicitly denying the creation of pods:
    Events:
    Type     Reason        Age                    From                   Message
    ----     ------        ----                   ----                   -------
    Warning  FailedCreate  10m (x2113 over 23d)   replicaset-controller  Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [allow-labels-only] All pods must have labels of owner and appName. [allow-only-xxxx-repo] container <secretgen-controller> has an invalid image repo localhost:5000/tkg/packages/core/secretgen-controller@sha256:xxxx, allowed repos are <repos FQDN>

Environment

VMware vSphere 8.0 with Tanzu

Cause

This issue is caused by an existing Gatekeeper (Open Policy Agent) installation within the guest cluster. Gatekeeper's Validating Admission Webhooks are intercepting and denying the creation of system pods required for the upgrade because they violate specific security constraints

Resolution

To resolve this issue, you must configure the Gatekeeper webhook to exempt system namespaces or temporarily relax the failure policy to allow the upgrade components to deploy.

Please refer to the Gatekeeper Guide for further information.