Containers are in Init:ErrImageNeverPull error state
search cancel

Containers are in Init:ErrImageNeverPull error state

book

Article ID: 306254

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • User is unable to do anything on the environment due to service not being up and running. Pods with Init:ErrImageNeverPull error on one or more nodes can be seen. Execute in order to see the states of the pods
kubectl get pods -n prelude

Example of pods with such error:

assessment-service-app-648bb587b4-24nw8        0/1     Init:ErrImageNeverPull   0          5h30m   10.244.0.235   prelude-004.eng.vmware.com   <none>           <none>
symphony-logging-daemonset-7phb9               0/1     ErrImageNeverPull        0          5h12m   10.244.0.239   prelude-004.eng.vmware.com   <none>           <none>
tango-blueprint-service-app-7c69f68d4c-2l8xg   0/1     Init:ErrImageNeverPull   0          5h28m   10.244.0.236   prelude-004.eng.vmware.com   <none>           <none>
tango-vro-gateway-app-7984985b88-rs7hs         0/1     Init:ErrImageNeverPull   0          5h35m   10.244.0.234   prelude-004.eng.vmware.com   <none>           <none>


Environment

VMware vRealize Automation 8.x

Cause

There might be different causes for this issue:
  • Ephemeral storage in Prelude is 100% of the disk
  • One or more of the storage Prelude disks are completely full or 80%+ full

(Disk /data is only 17% free, which is >80% used, which is a problem)
 
  • Node has been restarted due to unhealthy node status

Resolution

Steps to recover from this state:

  • Resize the affected disk and add to it at least 20GB, the more GBs added, the better
    • resizing happens through vSphere
  • Reboot the affected node and wait some time for things to go into normal state again (about 30-50 mins)


Alternative to the reboot is to execute the “/opt/scripts/restore_docker_images.sh” script on the affected node(s).