kubectl get pods -n prelude shows key services in a persistent ContainerCreating, Init:0/1, or ErrImageNeverPull status. Examples of affected pods include:
ccs-k3s-post-install-job-...idem-service-worker-...kubectl describe pod <pod-name> -n prelude, the Events section at the bottom shows a misleading "pull access denied" or authentication-related error, even though Aria Automation uses local images. The error may look similar to this:
Failed to pull image "image-name:version": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/image-name:version": failed to resolve reference "docker.io/library/image-name:version": pull access denied, repository does not exist or may require 'docker login'
VMware Aria Automation 8.x
The primary cause of this issue is an automated, self-preservation mechanism within the appliance's Kubernetes environment.
When an appliance node experiences high resource consumption (typically over 80-90% disk or memory utilization), Kubernetes will automatically begin a process called "image garbage collection." This process deletes local container images that are not currently in use to free up space and prevent a critical system failure.
While this is normal behavior, it means that if services are restarted later (either manually or via an automated process), the required images are no longer present on the node, leading to the ErrImageNeverPull state.
A less common cause is a storage or network outage that can lead to a corrupted local image cache, producing the same symptoms.
The solution is to restore the missing container images from the appliance's local archive on each affected node. This will allow Kubernetes to start the pods successfully.
-o wide flag:
kubectl get pods -n prelude -o wide
kubectl describe. Look for the "Node:" line in the output to see which appliance it's assigned to.
kubectl describe pod <pod-name> -n prelude
vra-node-01, vra-node-02) where pods are in an ErrImageNeverPull or other error state./opt/scripts/restore_docker_images.shYou must run this command on every node that has failing pods.
watch kubectl get pods -n preludeThe pods should transition from
ErrImageNeverPull to ContainerCreating, and finally to a healthy Running state within a few minutes. Once all pods are running, the cluster will return to a healthy state.