DNS resolution suddenly timed out from the pod in TKGi

search cancel

DNS resolution suddenly timed out from the pod in TKGi

book

Article ID: 328592

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

Symptoms:
DNS resolution is suddenly timed out from the pod in TKGi, output from inside a pod:

pod-test:~# nslookup api.test.cluster.example.com <---------Replace with your cluster API server FQDN
;; communications error to 1#.1##.2##.2#53: timed out

pod-test:~# digapi.test.cluster.example.com <---------Replace with your cluster API server FQDN 
;; communications error to 1#.1##.2##.2#53: timed out

pod-test:~# nslookup www.google.com <---------FQDN of reachable website
;; communications error to 1#.1##.2##.2#53: timed out

Cause

coredns service and pods are used for the DNS resolution as per resolv.conf inside the pod

pod-test:~# cat /etc/resolv.conf
      search default.svc.cluster.local svc.cluster.local cluster.local
      nameserver ##.###.###.# <-------IP of DNS server
      options ndots:5

Even if the DNS servers configured are up and running and reachable from the pod, a pod will try to use the coredns service first

Resolution

Review the health of coredns pods and service.

1) Check if the coredns service exists

$  kubectl get svc --namespace=kube-system

    NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
    antrea           ClusterIP   1#.1##.2##.#     <none>        443/TCP                  6h24m
    kube-dns         ClusterIP   1#.1##.2##.#     <none>        53/UDP,53/TCP,9153/TCP   6h16m
    metrics-server   ClusterIP   1#.1##.2##.1##   <none>        443/TCP                  6h16m

2) Check if the coredns pods are running

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

  NAME                       READY   STATUS    RESTARTS   AGE
  coredns-6f5c7f675f-qbh8t   1/1     Running   0          29m

3) Check the logs inside coredns pods

$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
  
  .:53
  [INFO] plugin/reload: Running configuration MD5 = 1d534941ad8884bb215680f48f8f5d2c
  CoreDNS-1.8.6
  linux/amd64, go1.19.5, v1.8.6+vmware.17

If the coredns pods are missing for a specifc TKGi cluster, you can re-push them by running the errand apply-addon with Bosh, where UUID can be retrieved from tkgi clusters command

bosh -d service-instance_UUID run-errand apply-addons

Additional Information

Impact/Risks:
DNS resolution failure for the pods using coredns service

Feedback

thumb_up Yes

thumb_down No