Internet-less environment cluster downtime due to missing kube-dns images
search cancel

Internet-less environment cluster downtime due to missing kube-dns images

book

Article ID: 298569

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Symptoms:

System pods that are re-scheduled cannot start in an internet less environment because kube-dns images are missing.

 

Environment


Cause

In an internet-less environment if a worker fills up, when docker cleanup (garbage collection) is run it will remove all unused docker images. Currently the clean up does not differentiate between system images and non system images. This means the kube-dns image can be cleaned up. Since it's an airgapped environment, the image cannot be pulled from k8s.gcr.io if the kube-dns pod is scheduled to run on this k8s node (worker).

Resolution

To prevent this from happening the following container images can be pre-loaded onto your private registry. They will need to be updated for each release:
  • coredns.yml: image: coredns/coredns:1.2.0
  • heapster.yml: image: k8s.gcr.io/heapster-amd64:v1.5.4
  • influxdb.yml: image: k8s.gcr.io/heapster-influxdb-amd64:v1.3.3
  • kube-dns.yml: image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10
  • kube-dns.yml: image: k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10
  • kube-dns.yml: image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10
  • kubernetes-dashboard.yml: image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
  • metrics-server/metrics-server-deployment.yml: image: gcr.io/google_containers/metrics-server-amd64:v0.3.1 k8s.gcr.io/pause:3.1
The following steps can be used to reload the system images manually:
  1. BOSH SSH onto the worker node.
  2. Run /var/vcap/jobs/kubelet/bin/post-start
  3. Run docker images to verify all system images are now available.
  4. Run kubectl get pods -n=kube-system to verify the kube-dns pods are now running.