ArgoCD becomes OutofSync continuously after Migrating a Tanzu Kubernetes Cluster to a Cluster API (CAPI) object (label kubernetes.vmware.com/retire-tkc)
search cancel

ArgoCD becomes OutofSync continuously after Migrating a Tanzu Kubernetes Cluster to a Cluster API (CAPI) object (label kubernetes.vmware.com/retire-tkc)

book

Article ID: 432608

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware vSphere Kubernetes Service

Issue/Introduction

After migrating a Tanzu Kubernetes Cluster (TKC) to a Cluster API (CAPI) object (label kubernetes.vmware.com/retire-tkc), attempting to synchronize the cluster declaration via ArgoCD results in the following sequence of issues:

  1. Continuous OutOfSync status due to topology variables.

  2. Synchronization errors during deployment: InfrastructureReady: failed to create or update VirtualMachine: admission webhook "default.mutating.virtualmachine.v1alpha2.vmoperator.vmware.com" denied the request: no VM image exists for "<no value>" in namespace or cluster scope

  3. Tanzu systematically recreating worker nodes.

Environment

VMware Kubernetes Service
VMware vSphere with Tanzu
ArgoCD

Cause

his behavior is driven by three distinct factors during the cluster synchronization cycle:

  1. Topology Variables OutOfSync: Tanzu's mutating webhooks dynamically inject runtime variables (e.g., clusterEncryptionConfigYaml, user, and TKR_DATA) into the Cluster specification. ArgoCD detects these injected fields as configuration drift because they are not present in the declarative source manifests, resulting in a continuous OutOfSync state.

  2. VM Image Admission Webhook Error: The no VM image exists for "<no value>" error occurs when the vmoperator admission webhook cannot locate the required VirtualMachineImage (associated with the specified Tanzu Kubernetes release) within the namespace or cluster scope during the infrastructure provisioning phase.

  3. Worker Node Recreation: Cluster API (CAPI) objects utilizing ClusterClass topologies rely heavily on Server-Side Apply (SSA) to prevent patching conflicts. By default, ArgoCD utilizes Client-Side Apply. This causes ArgoCD to overwrite fields actively managed by CAPI controllers, which the controllers then attempt to correct, triggering continuous updates and the systematic recreation of worker nodes.

Resolution

1. Ignore dynamic topology variables in ArgoCD Update the ArgoCD Application manifest to ignore differences for the webhook-injected variables (clusterEncryptionConfigYaml, user, and TKR_DATA).

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: <tkc-name>
  namespace: argocd
spec:
  # existing spec
  ignoreDifferences:
    - group: cluster.x-k8s.io
      kind: Cluster
      namespace: <namespace>
      jsonPointers:
        - /spec/topology/variables
      jqPathExpressions:
        - '.spec.topology.variables[] | select(.name == "clusterEncryptionConfigYaml")'
        - '.spec.topology.variables[] | select(.name == "user")'
        - '.spec.topology.variables[] | select(.name == "TKR_DATA")'

2. Resolve VM Image Admission Webhook Error To resolve the no VM image exists for "<no value>" error during VirtualMachine creation, implement the resolution provided in the following Knowledge Base article: KB 406541

3. Prevent Worker Node Recreation by Enabling Server-Side Apply To stop ArgoCD from conflicting with CAPI controllers and causing continuous worker node rollouts, enable Server-Side Apply (SSA) in the ArgoCD Application manifest:

spec:
  syncPolicy:
    syncOptions:
      - ServerSideApply=true

Note: If worker node rollouts continue after enabling Server-Side Apply, capture the exact differential ArgoCD is evaluating to pinpoint the webhook-injected fields.

  1. Run the following command against the ArgoCD application (or extract the diff view directly from the ArgoCD UI): argocd app diff <application-name> --hard-refresh

  2. Identify the specific JSON path falling OutOfSync (e.g., cluster.x-k8s.io/cloned-from-group).

  3. Add a targeted exclusion to force ArgoCD to ignore these specific fields without ignoring the entire resource block, here is an example:

ignoreDifferences:
  - group: cluster.x-k8s.io
    kind: MachineDeployment
    namespace: <namespace>
    jqPathExpressions:
      - '.metadata.annotations["cluster.x-k8s.io/cloned-from-group"]'
      - '.metadata.annotations["cluster.x-k8s.io/cloned-from-kind"]'

Additional Information

Argo CD uses the ignoreDifferences config just for computing the diff between the live and desired state which defines if the application is synced or not. By default, ArgoCD uses Client-Side Apply, which often causes it to overwrite fields that the CAPI controllers manage, triggering continuous updates. You can refer to elow for more information.
Reference: https://argo-cd.readthedocs.io/en/stable/user-guide/sync-options/