Guest Clusters Fail to Pull TMC Extension Images After Upgrading to TMC Self-Managed 1.4.1

Products

VMware Tanzu Mission Control - SM

Issue/Introduction

After upgrading from TMC Self-Managed version 1.4.0 to 1.4.1, guest clusters may fail to reconcile TMC components and extensions, resulting in pods entering a ImagePullBackOff state in the vmware-system-tmc namespace.

This issue affects extensions such as:

extension-updater
cluster-secret
agent-updater
sync-agent
cluster-auth-pinniped
tmc-observer
inspection
policy-sync-extension
gatekeeper-operator

The problem arises when the upgraded system attempts to pull container images for these extensions but fails due to incorrect or outdated image registry paths.

Symptoms:

kubectl get pods -n vmware-system-tmc shows ImagePullBackOff status for multiple pods
kubectl describe pod <name> reveals failed image pulls from:
- harbor.<domain>:8443/tmc/498533941640.dkr.ecr.us-west-2.amazonaws.com/...
The newly required image registry path (tap-tmc-docker-virtual.usw1.packages.broadcom.com) is not being used consistently

Environment

VMware Tanzu Mission Control - SM (VMware Tanzu Mission Control - SM)

Cause

TMC Self-Managed 1.4.1 introduced a change to the default registry path for extension images, migrating from:

498533941640.dkr.ecr.us-west-2.amazonaws.com

to:

tap-tmc-docker-virtual.usw1.packages.broadcom.com

However, during the upgrade process from version 1.4.0 to 1.4.1, not all components or extension metadata were updated to reflect this new registry path. As a result:

Some components in guest clusters continue to reference the outdated registry
When the 1.4.1 image tarball is pushed to Harbor, some images are placed under the new tap- path, while the guest clusters still try to pull from the old path
The image pull fails because those images do not exist at the outdated location

This manifests as ImagePullBackOff for affected pods, and partial failures across extension lifecycle management.

Resolution

This issue is resolved in TMC Self-Managed 1.4.2. Customers are encouraged to upgrade to 1.4.2 to permanently avoid this behavior.

For environments still running 1.4.1 and experiencing this issue, the following workaround may be applied:

Restart the cluster-agent-service-server deployment in the tmc-local namespace to trigger extension metadata reconciliation and correct the registry references.

Run:

kubectl -n tmc-local rollout restart deploy cluster-agent-service-server

This forces the cluster-agent-service to refresh the extension image registry paths and resync with the appropriate container image references. Within a few minutes (5-10), guest cluster components should begin pulling images from the correct registry, and failed pods should recover.

Note:

This workaround does not delete or reset cluster state. It only triggers reconciliation logic that corrects stale registry path entries in the service’s internal metadata.

Additional Information

Querying the extension registry mappings in the cluster-agent-service database reveals mixed image registry references across extensions within the same cluster

To confirm, run the following SQL query against the cluster-agent-service database:

Connect to db:

psql $(kubectl -n tmc-local get secrets cluster-agent-postgres-creds -o json | jq -r '.data.PGURL | @base64d | sub("postgres-postgresql"; "127.0.0.1") | sub("5432"; "15432")')

Run the following query:

select name, cluster_name, image_registry from extension;

You may observe a mix of:

harbor.<domain>:8443/tmc/498533941640.dkr.ecr.us-west-2.amazonaws.com
harbor.<domain>:8443/tmc/tap-tmc-docker-virtual.usw1.packages.broadcom.com