Deploying a TKG 2.5.4, management cluster control plane is up but kubelet is stuck activating and containerd is unable to PullImage seeing a failed to pull and unpack image "<imageregistry>/etcd:v3.5.16_vmware.3": failed to resolve reference "<imageregistry>/etcd:v3.5.16_vmware.3".
After SSH to the stuck management cluster's control plane node and running 'crictl ps -a' no containers are observed or in Running state. After running 'crictl images' the imageregistry images are not visible.
And below are observed in the kubelet and containerd journalctl logs.
kubelet
Dec 12 09:27:20 mgmt-cluster-controlplane-##### kubelet[1747]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Dec 12 09:27:20 mgmt-cluster-controlplane-##### kubelet[1747]: I1212 09:27:20.444226 1747 server.go:209] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the>
Dec 12 09:27:20 mgmt-cluster-controlplane-##### kubelet[1747]: E1212 09:27:20.444335 1747 run.go:74] "command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet>
Dec 12 09:27:20 mgmt-cluster-controlplane-##### systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
containerd
Dec 12 10:11:07 mgmt-cluster-controlplane-##### containerd[1088]: time="2025-12-12T10:11:07.500046759Z" level=error msg="PullImage \"<imageregistry>/etcd:v3.5.16_vmware.3\" failed" error="failed to pull and unpack image \"<imageregistry>/etcd:v3.5.16_vmware.3\": failed to resolve reference \"<imageregistry>/etcd:v3.5.16_vmware.3\": failed to do request: Head \"https://<imageregistry>/etcd/manifests/v3.5.16_vmware.3\": dial tcp: lookup <imageregistry> on 127.0.0.53:53: read udp 127.0.0.1:32982->127.0.0.53:53: i/o timeout"
Dec 12 10:11:07 mgmt-cluster-controlplane-##### containerd[1088]: time="2025-12-12T10:11:07.500114475Z" level=info msg="stop pulling image <imageregistry>/etcd:v3.5.16_vmware.3: active requests=0, bytes read=0"
When running 'nslookup <imageregistry> <DNS_IP>' you see a message similar to below:
;; communications error to ###.##.###.###53: timed out
;; no servers could be reached
When checking nslookup command the timed out message is observed for both DNS IPs configured as per resolvectl
TKGm 2.5.4
AVI
The management cluster control plane is stuck creating as it is unable to pull images from the configured imageregistry, therefore kubelet cannot be brought to running state, etcd image cannot be obtained for etcd image in order to configure etcd etc.
The kubelet is in activating state instead of activated and Running. The /var/lib/kubelet/config.yaml cannot be obtained and therefore cannot load. Other required images like etcd can also not be obtained or brought to Running state.
PullImage error is failing because DNS IPs configured are unable to resolve the imageregistry's FQDN and therefore cannot pull required images for the management cluster's control plane nodes to complete their bootstrap.
Have your network team look into communication issue between your management cluster's network and your DNS and allow this communication.
Then delete and recreate the management cluster if containerd of management cluster control plane node does not progress the PullImage.