TKC on version 1.27 shows "Updating" because the GatewayApi reconcile Failed
search cancel

TKC on version 1.27 shows "Updating" because the GatewayApi reconcile Failed

book

Article ID: 389256

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  1.  The Guest Cluster status shows "Updating" when you describe the tkc/cluster object.

  2. The cluster nodes are all up and running and healthy. The kubectl commands to check the state of the VMs' and machines shows all in a "Running" state. No VMs' stuck in "deleting" or any additional VMs' trying to come up stuck in "provisioning" or "provisioned" state, however the Cluster readiness state is in "False".

  3. Log snippet observed in tkg-controller-manager logs of the Supervisor:

    tanzukubernetescluster.go:319] "Condition indicates failure" logger="svc-tkg-domain-###-tkg-controller.tanzukubernetescluster-spec-controller.<namespace>.<Guest-cluster-name>" Condition="Gateway-Api-ReconcileFailed" Status="True"
    tanzukubernetescluster_controller.go:467] "Error while reconcilling cluster object requeuing for retry" err="fake error for quick requeuing" logger="svc-tkg-domain-###-tkg-controller.tanzukubernetescluster-spec-controller.<namespace>.<Guest-cluster-name>" cluster.name="<Guest-cluster-name>"


  4. On describing the clusterbootstrap object it says the GatewayApi reconcile is failing because the gatewayclass object's storedversion value is "v1alpha1pre1" whereas the expected value is "v1"

    message: |-
      kapp: Error: update customresourcedefinition/gatewayclasses.gateway.networking.k8s.io (apiextensions.k8s.io/v1) cluster:
        Updating resource customresourcedefinition/gatewayclasses.gateway.networking.k8s.io (apiextensions.k8s.io/v1) cluster:
          API server says:
            CustomResourceDefinition.apiextensions.k8s.io "gatewayclasses.gateway.networking.k8s.io" is invalid: status.storedVersions[0]:
              Invalid value: "v1": must appear in spec.versions (reason: Invalid)

Environment

Tanzu Kubernetes Release 1.27.x

Cause

The 'v1alpha1pre1' versioned `GatewayClass` is never shipped with any current or past releases of the GatewayApi. The first release of the GatewayApi starts with the version v1alpha1 itself.

Resolution

The below fix should get the cluster back to a "Running/Ready" state.

    1. Upgrade the cluster to version 1.28.x.
    2. Initiate a cluster rollout by changing the vmclass so that it deploys new nodes.