During the installation or upgrade of the Cloud Consumption Interface (CCI) in a VMware Cloud Foundation (VCF) environment, pods may fail to start and remain in ErrImagePull status.
Checking the pod state logs at /var/log/vmware/podstate reveals the following error:
message": "failed to sync pod \"svc-cci-service-domain-<domain id>/cci-service-<uuid>" in the provider: failed to pull images: failed to get images: Image svc-cci-service-domain-<domain id>/cci-supervisor-serv-<uuid> has failed. Error: Failed to resolve on node <ESXI host name>. Reason: Http request failed. Code 400: ErrorType(2) failed to do request: Head \"https://projects.packages.broadcom.com/v2/vcf_cci_service/cci-supervisor-service/manifests/sha256:<SHA Value>\": dial tcp: lookup projects.packages.broadcom.com on <IP of DNS server configured in workload network settings>:53: server misbehaving: ErrImagePull",
This issue occurs when the datapath is not configured to allow DNS traffic from the Workload Network egress IP addresses. Traffic for these requests originates from the eth1 interface of the Supervisor control VMs. If the path to the DNS server is restricted for this specific IP range, resolution of the Broadcom registry will fail with the earlier error.
To resolve this issue, you must permit DNS traffic for the Supervisor network path:
To validate traffic on an NSX backed VKS deployment. Traceflow can be used from the supervisor control VMs eth1 interface to the layer 3 IP address of the DNS server configured as VKS workload DNS with port set to UDP 53.