Upgrade TKGm from 2.3.x to 2.4.x, it will fail on the first control plane node. On this node, in the cloud-init-output.log, the following error is observed:
failed to pull image projects.registry.vmware.com/tkg/coredns:v1.8.6_vmware.26
In an air-gapped environment, the node should not try to pull from the public repository.
TKGm 2.3.x
The exact cause for this issue is unknown. It seems to happen when upgrading a very old cluster (ie, a cluster whos version was initially <1.6.x and upgraded to 2.3.x).
When the upgrade is kicked off, and the first new CP node is created, ssh into the node and manually pull the missing image from your offline repository.:
crictl pull offline.repo.example.com:5000/tkg/coredns:v1.8.6_vmware.26
You will also have to tag the image with "projects.registry.vmware.com/tkg/coredns:v1.8.6_vmware.26" as this is what the system is expecting:
ctr --namespace=k8s.io image tag offline.repo.example.com:5000/tkg/coredns:v1.8.6_vmware.26 projects.registry.vmware.com/tkg/coredns:v1.8.6_vmware.26
After this, repeat steps on all new CP nodes. Once done, the worker nodes will deploy without issue.