DNS resolution suddenly timed out from the pod in TKGi
search cancel

DNS resolution suddenly timed out from the pod in TKGi

book

Article ID: 328592

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:
DNS resolution is suddenly timed out from the pod in TKGi, output from inside a pod:
pod-test:~# nslookup api.tanzu-gss-labs.vmware.com
;; communications error to 10.100.200.2#53: timed out

pod-test:~# dig api.tanzu-gss-labs.vmware.com
;; communications error to 10.100.200.2#53: timed out

pod-test:~# nslookup www.vmware.com
;; communications error to 10.100.200.2#53: timed out


Cause

coredns service and pods are used for the DNS resolution as per resolv.conf inside the pod
pod-test:~# cat /etc/resolv.conf
      search default.svc.cluster.local svc.cluster.local cluster.local
      nameserver 10.100.200.2
      options ndots:5
Even if the DNS servers configured are up and running and reachable from the pod, a pod will try to use the coredns service first

Resolution

Review the health of coredns pods and service.

1) Check if the coredns service exists
$  kubectl get svc --namespace=kube-system

    NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
    antrea           ClusterIP   10.100.200.3     <none>        443/TCP                  6h24m
    kube-dns         ClusterIP   10.100.200.2     <none>        53/UDP,53/TCP,9153/TCP   6h16m
    metrics-server   ClusterIP   10.100.200.149   <none>        443/TCP                  6h16m

2) Check if the coredns pods are running
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

  NAME                       READY   STATUS    RESTARTS   AGE
  coredns-6f5c7f675f-qbh8t   1/1     Running   0          29m


3) Check the logs inside coredns pods
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
  
  .:53
  [INFO] plugin/reload: Running configuration MD5 = 1d534941ad8884bb215680f48f8f5d2c
  CoreDNS-1.8.6
  linux/amd64, go1.19.5, v1.8.6+vmware.17



If the coredns pods are missing for a specifc TKGi cluster, you can re-push them by running the errand apply-addon with Bosh, where UUID can be retrieved from tkgi clusters command
bosh -d service-instance_UUID run-errand apply-addons



Additional Information

Impact/Risks:
DNS resolution failure for the pods using coredns service