TKGs - TKC cluster fails to update due to TLS handshake errors between Supervisor's controller-managers and etcd
search cancel

TKGs - TKC cluster fails to update due to TLS handshake errors between Supervisor's controller-managers and etcd

book

Article ID: 345905

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

  • TKC cluster shows Ready state as "False" after update, even though vm's and machines show as running.
  • TKC cluster fails to be updated after modifying the TKC definition file, i.e. change the number of node replicas, change the VMclasses, etc.
  • These changes are reflected in the TKC object, but they are not propagated to all other CAPI objects such as Machines, vSphereMachines, etc.
  • "capi-kubeadm-control-plane-controller-manager" and "capi-controller-manager" logs show TLS handshake errors between the controller-managers and the etcd endpoints.

<timestamp>.56448132Z stderr F 2023/10/27 10:02:28 http: TLS handshake error from <SV_etcd_IP>:35114: read tcp <controller-manager_IP>:9874-><SV_etcd_IP>:35114: read: connection reset by peer

<timestamp>.437517695Z stderr F 2023/10/27 10:08:14 http: TLS handshake error from <SV_etcd_IP>:33405: EOF

 

Environment

VMware vSphere 7.0 with Tanzu

Cause

  • TLS handshake errors can have many different causes. They indicate that, at least at the time when the TKC cluster was updated, the controller-managers weren't able to communicate with the SV's etcd cluster.
  • This could be caused by an etcd issue, a network temporary problem, high resource utilization in the SV nodes where the controller-manager pods are running, etc.
  • Sometimes, even if the connectivity gets reestablished, the TKC cluster update order hangs and it's needed to force new controller-managers' TLS certificates creation by recreating the pods.

Resolution

  • First, make sure that the Supervisor's etcd cluster is in a healthy status. From any of the Supervisor ControlPlane nodes, run:
    • etcdctl member list -w table
    • etcdctl endpoint health status --cluster=true -w table
    • etcdctl endpoint status --cluster=true -w table

Expected output from a healthy etcd cluster:

 

  • Once verified the etcd cluster is healthy, scale down/up "capi-kubeadm-control-plane-controller-manager" and "capi-controller-manager" deployments. Run the following commands from a Supervisor ControlPlane node.

capi-kubeadm-control-plane-controller-manager:

  1. kubectl get deploy,rs,po -n vmware-system-capw -> Note down the number of replicas for "capi-kubeadm-control-plane-controller-manager" and "capi-controller-manager" deployments.
  2. kubectl scale deploy capi-kubeadm-control-plane-controller-manager --replicas=0 -n vmware-system-capw
  3. kubectl get deploy,rs,po -n vmware-system-capw -> Wait for the pods to be terminated.
  4. kubectl scale deploy capi-kubeadm-control-plane-controller-manager --replicas=<number_of_original_replicas_from_step_1> -n vmware-system-capw
  5. kubectl get deploy,rs,po -n vmware-system-capw -> Wait for the new pods to be up and running.

capi-controller-manager:

  1. kubectl get deploy,rs,po -n vmware-system-capw -> Note down the number of replicas for "capi-kubeadm-control-plane-controller-manager" and "capi-controller-manager" deployments.
  2. kubectl scale deploy capi-controller-manager --replicas=0 -n vmware-system-capw
  3. kubectl get deploy,rs,po -n vmware-system-capw -> Wait for the pods to be terminated.
  4. kubectl scale deploy capi-controller-manager --replicas=<number_of_original_replicas_from_step_1> -n vmware-system-capw
  5. kubectl get deploy,rs,po -n vmware-system-capw -> Wait for the new pods to be up and running.

Example:

root@422109d8aba074bc4766915bb387413a [ ~ ]# k get deploy,rs,po -n vmware-system-capw

NAME                              READY  UP-TO-DATE  AVAILABLE  AGE

deployment.apps/capi-controller-manager             2/2   2      2      14d

deployment.apps/capi-kubeadm-bootstrap-controller-manager    2/2   2      2      14d

deployment.apps/capi-kubeadm-control-plane-controller-manager  2/2   2      2      14d

deployment.apps/capv-controller-manager             2/2   2      2      14d

deployment.apps/capw-controller-manager             2/2   2      2      14d

deployment.apps/capw-webhook                  2/2   2      2      14d

 

NAME                                    DESIRED  CURRENT  READY  AGE

replicaset.apps/capi-controller-manager-5dcf5f57f8             2     2     2    14d

replicaset.apps/capi-kubeadm-bootstrap-controller-manager-77f74f899c    2     2     2    14d

replicaset.apps/capi-kubeadm-control-plane-controller-manager-6d5667dc69  2     2     2    14d

replicaset.apps/capv-controller-manager-65b579946b             2     2     2    14d

replicaset.apps/capw-controller-manager-88875dc69             2     2     2    14d

replicaset.apps/capw-webhook-6cf4bcbc4b                  2     2     2    14d

 

NAME                                 READY  STATUS  RESTARTS    AGE

pod/capi-controller-manager-5dcf5f57f8-2b657             2/2   Running  0       85m

pod/capi-controller-manager-5dcf5f57f8-zwh9c             2/2   Running  0       85m

pod/capi-kubeadm-bootstrap-controller-manager-77f74f899c-pbtnh    2/2   Running  31 (27h ago)  14d

pod/capi-kubeadm-bootstrap-controller-manager-77f74f899c-xq6j7    2/2   Running  38 (27h ago)  14d

pod/capi-kubeadm-control-plane-controller-manager-6d5667dc69-4rs6k  2/2   Running  0       91m

pod/capi-kubeadm-control-plane-controller-manager-6d5667dc69-k7nzj  2/2   Running  0       91m

pod/capv-controller-manager-65b579946b-kcpqq             1/1   Running  52 (27h ago)  14d

pod/capv-controller-manager-65b579946b-r2wgp             1/1   Running  53 (27h ago)  14d

pod/capw-controller-manager-88875dc69-fb985             2/2   Running  51 (27h ago)  14d

pod/capw-controller-manager-88875dc69-qs7fd             2/2   Running  47 (27h ago)  14d

pod/capw-webhook-6cf4bcbc4b-6ck5k                  2/2   Running  1 (14d ago)  14d

pod/capw-webhook-6cf4bcbc4b-bj4fc                  2/2   Running  0       14d

root@422109d8aba074bc4766915bb387413a [ ~ ]#

root@422109d8aba074bc4766915bb387413a [ ~ ]# k scale deploy capi-kubeadm-control-plane-controller-manager --replicas=0 -n vmware-system-capw

deployment.apps/capi-kubeadm-control-plane-controller-manager scaled

root@422109d8aba074bc4766915bb387413a [ ~ ]#

root@422109d8aba074bc4766915bb387413a [ ~ ]# k get deploy,rs,po -n vmware-system-capw

NAME                              READY  UP-TO-DATE  AVAILABLE  AGE

deployment.apps/capi-controller-manager             2/2   2      2      14d

deployment.apps/capi-kubeadm-bootstrap-controller-manager    2/2   2      2      14d

deployment.apps/capi-kubeadm-control-plane-controller-manager  0/0   0      0      14d

deployment.apps/capv-controller-manager             2/2   2      2      14d

deployment.apps/capw-controller-manager             2/2   2      2      14d

deployment.apps/capw-webhook                  2/2   2      2      14d

 

NAME                                    DESIRED  CURRENT  READY  AGE

replicaset.apps/capi-controller-manager-5dcf5f57f8             2     2     2    14d

replicaset.apps/capi-kubeadm-bootstrap-controller-manager-77f74f899c    2     2     2    14d

replicaset.apps/capi-kubeadm-control-plane-controller-manager-6d5667dc69  0     0     0    14d

replicaset.apps/capv-controller-manager-65b579946b             2     2     2    14d

replicaset.apps/capw-controller-manager-88875dc69             2     2     2    14d

replicaset.apps/capw-webhook-6cf4bcbc4b                  2     2     2    14d

 

NAME                               READY  STATUS  RESTARTS    AGE

pod/capi-controller-manager-5dcf5f57f8-2b657           2/2   Running  0       87m

pod/capi-controller-manager-5dcf5f57f8-zwh9c           2/2   Running  0       87m

pod/capi-kubeadm-bootstrap-controller-manager-77f74f899c-pbtnh  2/2   Running  31 (27h ago)  14d

pod/capi-kubeadm-bootstrap-controller-manager-77f74f899c-xq6j7  2/2   Running  38 (27h ago)  14d

pod/capv-controller-manager-65b579946b-kcpqq           1/1   Running  52 (27h ago)  14d

pod/capv-controller-manager-65b579946b-r2wgp           1/1   Running  53 (27h ago)  14d

pod/capw-controller-manager-88875dc69-fb985           2/2   Running  51 (27h ago)  14d

pod/capw-controller-manager-88875dc69-qs7fd           2/2   Running  47 (27h ago)  14d

pod/capw-webhook-6cf4bcbc4b-6ck5k                2/2   Running  1 (14d ago)  14d

pod/capw-webhook-6cf4bcbc4b-bj4fc                2/2   Running  0       14d

root@422109d8aba074bc4766915bb387413a [ ~ ]#

root@422109d8aba074bc4766915bb387413a [ ~ ]# k scale deploy capi-kubeadm-control-plane-controller-manager --replicas=2 -n vmware-system-capw

deployment.apps/capi-kubeadm-control-plane-controller-manager scaled

root@422109d8aba074bc4766915bb387413a [ ~ ]#

root@422109d8aba074bc4766915bb387413a [ ~ ]# k get deploy,rs,po -n vmware-system-capw

NAME                              READY  UP-TO-DATE  AVAILABLE  AGE

deployment.apps/capi-controller-manager             2/2   2      2      14d

deployment.apps/capi-kubeadm-bootstrap-controller-manager    2/2   2      2      14d

deployment.apps/capi-kubeadm-control-plane-controller-manager  2/2   2      2      14d

deployment.apps/capv-controller-manager             2/2   2      2      14d

deployment.apps/capw-controller-manager             2/2   2      2      14d

deployment.apps/capw-webhook                  2/2   2      2      14d

 

NAME                                    DESIRED  CURRENT  READY  AGE

replicaset.apps/capi-controller-manager-5dcf5f57f8             2     2     2    14d

replicaset.apps/capi-kubeadm-bootstrap-controller-manager-77f74f899c    2     2     2    14d

replicaset.apps/capi-kubeadm-control-plane-controller-manager-6d5667dc69  2     2     2    14d

replicaset.apps/capv-controller-manager-65b579946b             2     2     2    14d

replicaset.apps/capw-controller-manager-88875dc69             2     2     2    14d

replicaset.apps/capw-webhook-6cf4bcbc4b                  2     2     2    14d

 

NAME                                 READY  STATUS  RESTARTS    AGE

pod/capi-controller-manager-5dcf5f57f8-2b657             2/2   Running  0       90m

pod/capi-controller-manager-5dcf5f57f8-zwh9c             2/2   Running  0       90m

pod/capi-kubeadm-bootstrap-controller-manager-77f74f899c-pbtnh    2/2   Running  31 (27h ago)  14d

pod/capi-kubeadm-bootstrap-controller-manager-77f74f899c-xq6j7    2/2   Running  38 (27h ago)  14d

pod/capi-kubeadm-control-plane-controller-manager-6d5667dc69-5vb78  2/2   Running  0       2m18s

pod/capi-kubeadm-control-plane-controller-manager-6d5667dc69-cmp4d  2/2   Running  0       2m18s

pod/capv-controller-manager-65b579946b-kcpqq             1/1   Running  52 (27h ago)  14d

pod/capv-controller-manager-65b579946b-r2wgp             1/1   Running  53 (27h ago)  14d

pod/capw-controller-manager-88875dc69-fb985             2/2   Running  51 (27h ago)  14d

pod/capw-controller-manager-88875dc69-qs7fd             2/2   Running  47 (27h ago)  14d

pod/capw-webhook-6cf4bcbc4b-6ck5k                  2/2   Running  1 (14d ago)  14d

pod/capw-webhook-6cf4bcbc4b-bj4fc                  2/2   Running  0       14d

 

  • Check that there are no more TLS handshake errors in "capi-kubeadm-control-plane-controller-manager" and "capi-controller-manager" logs.
    • kubectl logs capi-kubeadm-control-plane-controller-manager-<>-<> manager -n vmware-system-capw | grep -i tls
    • kubectl logs capi-controller-manager-<>-<> manager -n vmware-system-capw | grep -i tls
  • Check that the TKC cluster update completes successfully.

If cluster "Ready" status still shows as "False", restart wcp service on the vCenter.

service-control --restart wcp