Supervisor Services failed to get images when deploying

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Issue Clarification:
The kubelet failed to pull the Supervisor Service image, in this case for the Consumpion Interface(cci-service) due to a timeout error. The error message indicated that the image was in the "resolving" state on an ESXi node.

Issue Verification:
1. Verified that the image resolution was successful.
2. Confirmed that the image remained in a "resolving" state despite resolution success.
3. Observed that the affected envoy pods were not starting due to image issues.

Environment

vCenter 8.0U3

vSphere Supervisor

Cause

The "resolving" state typically indicates an issue with the image resolution process in the container runtime, such as:
1. Incomplete or corrupted image resolution on the node.
2. Residual artifacts (e.g., imagedisks) associated with the problematic image.

Resolution

1. Fristly, Check storage: datastores, policies and resource limits. make sure that there is sufficient space available and allocated?

2. Delete the imagedisks and images for cci need to get the image disk command

kubectl delete -n vmware-system-kubeimage imagedisk.imagecontroller.vmware.com/1afcb3c1e07f65f30e1b0c3842a0aca44634cb3af27f4829027d40f42a9c83c7-v4484250
kubectl delete -n vmware-system-kubeimage imagedisk.imagecontroller.vmware.com/4dfaf62e4c45e48a1fb49557dbd6e812a4f7c7e51b81dbc895849c448e9bbcb9-v38231648
kubectl delete -n svc-cci-service-domain-cXXXX image.imagecontroller.vmware.com/cci-namespace-ui-se-d2e627b6ba4f8a15ebc193b46b50326fe80def61-v65467
kubectl delete -n svc-cci-service-domain-cXXXX image.imagecontroller.vmware.com/cci-supervisor-serv-7bc683ef56feae22a116a6f0e5ee94eb5e523f56-v73275

3. Delete the consumption interface pods. Redeploying the pods should force the system to fetch the image afresh, bypassing the "resolving" state issue.

# set the variable for the consumption namespace (check this is correct name)
export TNS="svc-cci-service-domain-cXXXX"

# check the pods in the namespace
k get pods -n $TNS

# generate kubectl delete for the pods in the namespace
k get pods -n $TNS | awk -v tns=$TNS ' $0 ~/cci/ {print "kubectl delete pod -n ", tns,$1}'

# *** Copy the generated commands and delete the pods ***

# check the pods in the namespace come back, may take a while
k get pods -n $TNS

4. check the images and imagedisks recreated

# check the images
kubectl get image -A | egrep "^NAME|svc-cci"

# check the image disks
kubectl get imagedisks -A | grep -E "$(kubectl get image -A | egrep "^NAME|svc-cci" | awk ' BEGIN { ORS ="|"} $0 ~ /cci/ {print $NF}')NAME"

# For information, these imagedisks are typically roughly 80Mi and 820 MI in most environments
# kubectl get imagedisks -A | grep -E "$(kubectl get image -A | egrep "^NAME|svc-cci" | awk ' BEGIN { ORS ="|"} $0 ~ /cci/ {print $NF}')NAME"
# NAMESPACE NAME STATUS DISK SIZE
# vmware-system-kubeimage 1afcb3c1e07f65f30e1b0c3842a0aca44634cb3af27f4829027d40f42a9c83c7-v12325951 Ready 868fdc15-13a0-4501-aa51-6c7faa794765 80012Ki
# vmware-system-kubeimage 7719d42ac092eb4f5168501316737db333d07782ea8e6d7d728452216cef7936-v8525705 Ready 03a4aeb0-551c-4e8f-9b71-20a2d9d153dd 837621Ki