CoreDNS pods scheduled on the same control plane node are causing name resolution failures during the Guest Cluster Upgrade
search cancel

CoreDNS pods scheduled on the same control plane node are causing name resolution failures during the Guest Cluster Upgrade

book

Article ID: 421780

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

  • During a TKGs or VKS cluster upgrade process, a temporary but complete loss of internal DNS resolution (managed by CoreDNS) may occur. This happens when both replicas of the coredns deployment are initially scheduled onto the same control plane (CP) node.
  • List the nodes on the example-cluster

# kubectl get no -A

NAME                                      STATUS   ROLES           AGE   VERSION

example-cluster-####-brmcv                     Ready    control-plane   72m   v1.##.#+vmware.3-fips

example-cluster-####-j2fxs                     Ready    control-plane   68m   v1.##.#+vmware.3-fips

example-cluster-####-pv8p4                     Ready    control-plane   84m   v1.##.#+vmware.3-fips

example-cluster-node-pool-1-####-5c4b5-54ktg   Ready    <none>          75m   v1.##.#+vmware.3-fips

example-cluster-node-pool-1-####-5c4b5-8vjcd   Ready    <none>          74m   v1.##.#+vmware.3-fips

example-cluster-node-pool-1-####-5c4b5-r6954   Ready    <none>          74m   v1.##.#+vmware.3-fips

  • List the CoreDNS Pods and their assigned nodes: 

# kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE                    NOMINATED NODE   READINESS GATES

coredns-5####cd-k6pdh   1/1     Running   0          73m   ##.##.##.##   example-cluster-####-pv8p4   <none>           <none>

coredns-59###bcd-rwrls   1/1     Running   0          82m   ##.##.#.##   example-cluster-####-pv8p4   <none>           <none>

  • Both CoreDNS pods are presently co-located on a single node example-cluster-####-pv8p4

Cause

This is due to the simultaneous eviction of all coredns replicas due to a lack of an explicit scheduling constraint. By default, the coredns deployment does not include a Kubernetes Pod Anti-Affinity policy to prevent its replicas from being scheduled on the same node. Without this policy, the Kubernetes Scheduler is free to place both pods on the same node, which breaks the assumption of high availability across nodes.

Resolution

This issue originates from the CoreDNS upstream. Broadcom engineering is aware of this and is actively working to improve the handling of CoreDNS placement constraints, especially during cluster lifecycle operations.

Workaround:

To immediately rebalance the CoreDNS pods to different nodes, manually delete one of the pods before initiating the guest cluster upgrade. The kube-controller-manager will immediately create a replacement pod, and the Kube-Scheduler will attempt to place it on a different available/healthy node: 

  1. Verify current pod locations: Check which nodes are currently hosting the CoreDNS pods.

    kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

  2. Trigger re-scheduling: If both pods are on the same node, delete one pod to force kube-scheduler to reschedule it. Replace the example pod name with your actual pod name.

    kubectl delete po coredns-5####cd-k6pdh -n kube-system

  3. Verify re-scheduling: Confirm that the new pod has been scheduled onto a different node by running the get pods command again.

    kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

    NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE                    NOMINATED NODE   READINESS GATES
    coredns-53###zz-k7gmo   1/1     Running   0          1m   ##.##.##.##   example-cluster-####-j2fxs   <none>           <none>
    coredns-59###bcd-rwrls   1/1     Running   0          82m   ##.##.##.##   example-cluster-####-pv8p4   <none>           <none>