Kubernetes Service package reconciliation deadlock after using pull/push for service bundles in an air-gapped environment
search cancel

Kubernetes Service package reconciliation deadlock after using pull/push for service bundles in an air-gapped environment

book

Article ID: 437632

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

In air-gapped VMware Cloud Foundation or vSphere environments, upgrading the vSphere Kubernetes Service may result in a permanent reconciliation deadlock. This occurs if the service bundles were relocated to a private registry using docker pull/push or imgpkg pull/push instead of the recursive imgpkg copy method.

Symptoms include:

  • The svc-tkg Supervisor Service remains in a "Reconciling" or "Degraded" state indefinitely.

  • Sub-packages fail to pull images, referencing public registry URLs (e.g., projects.registry.vmware.com) despite the environment being isolated.

  • The kapp-controller fails to process corrected configuration changes even after the image registry paths are remediated.

Environment

  • VMware Cloud Foundation

  • vSphere Kubernetes Service 

  • Air-gapped / Disconnected deployments

Cause

This issue is caused by the kapp-controller reconciliation prioritization. When a PackageInstall specification is not modified (e.g., the version remains the same), the controller marks it as a "no-op". In an air-gapped environment where sub-packages contain stale public registry references, these "no-op" reconciliation attempts fail. Because the controller prioritizes these tasks in its first reconciliation batch, the updated Package data (containing the fix) is never processed, resulting in an infinite loop.

Resolution

To resolve this deadlock condition, the execution engine must be forced to evaluate the child components via metadata modifications or by using the integrated platform management script.

Manual Patching Procedures (For deployments prior to vCenter Server 8.0 U3j)

  1. Execute the following command to stop the current reconciliation attempts:

    kubectl patch pkgi/svc-tkg.vsphere.vmware.com \
    --namespace vmware-system-supervisor-services \
    --type='json' \
    --patch='[{"op": "add", "path": "/spec/paused", "value":true}]'
    
  2. Identify the <Domain Namespace> (e.g., svc-tkg-domain-c####) and apply a dummy label to all child packages to force a state change:

    kubectl patch pkgi -n <Domain Namespace> --type='merge' --patch '{"metadata":{"labels":{"fix-trigger":"true"}}}' tanzu-addons-manager
    kubectl patch pkgi -n <Domain Namespace> --type='merge' --patch '{"metadata":{"labels":{"fix-trigger":"true"}}}' tkg-controller
    # Repeat for all child packages: tanzu-cliplugins, tanzu-cluster-api, runtime-extension, tkr-service, etc.
    
  3. Unpause the parent PackageInstall:

    kubectl patch pkgi/svc-tkg.vsphere.vmware.com \
    --namespace vmware-system-supervisor-services \
    --type='json' \
    --patch='[{"op": "remove", "path": "/spec/paused"}]'
    
  4. Toggle the paused status of the App resource to trigger an immediate update:

    kubectl patch apps/svc-tkg.vsphere.vmware.com -n vmware-system-supervisor-services --type='json' -p='[{"op": "add", "path": "/spec/paused", "value":true}]'
    kubectl patch apps/svc-tkg.vsphere.vmware.com -n vmware-system-supervisor-services --type='json' -p='[{"op": "remove", "path": "/spec/paused"}]'
    
  5. Monitor the status until the DESCRIPTION field no longer shows "Reconciling":

    kubectl get app svc-tkg.vsphere.vmware.com -n vmware-system-supervisor-service

 

Utilizing the Native Override Utility (From vCenter Server 8.0U3j and later)

  1. Log into the Supervisor control plane console via an SSH session.

  2. Execute the integrated package override script to adjust the image reference endpoints:

/usr/lib/vmware-wcp/override-package-image.sh \
  -p "tkg.vsphere.vmware.com.3.5.1-embedded+v1.34" \
  -i "my-registry.example.com/tkg-svs/package/tkg-service:3.5.1"

The execution utility provides the following configuration parameters:

  • -p, --package-name: Package target identifier (e.g., tkg.vsphere.vmware.com.3.5.1-embedded+v1.34)

  • -i, --image: The corrected target registry URL destination

  • -n, --namespace: The target operational namespace (defaults to vmware-system-supervisor-services)

  • --dry-run: Validates and reflects changes visually without writing modifications

  • -h, --help: Displays the internal utility usage information

 

Additional Information

Always use imgpkg copy for air-gapped bundle relocation to ensure all nested image references are correctly updated to the target private registry.