Identifying component leaders of TKGI components
search cancel

Identifying component leaders of TKGI components

book

Article ID: 370764

calendar_today

Updated On: 06-25-2024

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi) VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated Edition (Core) VMware Tanzu Kubernetes Grid Integrated Edition 1.x VMware Tanzu Kubernetes Grid Integrated EditionStarter Pack (Core)

Issue/Introduction

There are multiple components in TKGI which operate in a leader/follower mode. In this high availability pattern, the leader is the entry point of requests and is responsible for coordinating tasks with the followers. The components that fall into this category are

  • ETCD
  • NCP
  • Kubernetes Controller Manager
  • Kubernetes Scheduler
  • CSI Components

In a multi-control plane and worker node environment, tracking down the leader is important for troubleshooting and logs review. For the below components, leader election uses lease API from the coordination.k8s.io API group to identify the leading replica and continuously renew it based on the timestamps monitored by Lease Duration Seconds

  • Kubernetes controller manager
  • Kubernetes Scheduler
  • CSI Components

Resolution

Identify the leaseholder

kubectl get leases.coordination.k8s.io -A | grep -v node

NAMESPACE           NAME                                              HOLDER                                                                      AGE
kube-system         kube-controller-manager                           ad975454-1101-4a24-b2fa-25705d3b9dc0_faf633cc-0d5a-4b8a-ba45-c85bbbd50024   127m
kube-system         kube-scheduler                                    ad975454-1101-4a24-b2fa-25705d3b9dc0_8109191c-1eb4-4d13-967b-1735e19086fb   127m
vmware-system-csi   csi-vsphere-vmware-com                            ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
vmware-system-csi   external-attacher-leader-csi-vsphere-vmware-com   ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
vmware-system-csi   external-resizer-csi-vsphere-vmware-com           ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
vmware-system-csi   vsphere-syncer                                    ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m

The names in the Holder column are the nodes that are holding the lease. These holder names do not correspond to the Kubernetes node names. The holder names are bosh deployed VMs hostnames.

bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master hostname | egrep -v 'subject|to|use'

master/a2cb06fc-c6d2-477c-bdfb-6212591b38c6: stdout | 6e2aa260-2ec5-4537-9133-46192d858a3b
master/31c0f1f6-2104-4479-a4e3-39ed63aadc5c: stdout | f8ad35c5-198d-46c8-bdb7-bbf610b81329
master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b: stdout | ad975454-1101-4a24-b2fa-25705d3b9dc0

As clear from the output above all the leases in this environment are held by a node with hostname ad975454-1101-4a24-b2fa-25705d3b9dc0 which is master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b .This means the replica running on this node will have the leader for these components. You can bosh ssh to this node to monitor and check out the logs.

Identifying ETCD leader

Below command gives us the etcd leader which is master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b 

bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master/0 "ETCDCTL_API=3 /var/vcap/jobs/etcd/bin/etcdctl endpoint status" | egrep -v 'subject|to|use' | grep true

master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b: stdout | https://master-0.etcd.cfcr.internal:2379, 17f206fd866fdab2, 3.5.4, 5.5 MB, true, false, 4, 28536, 28536,

Identifying NCP master

Identifying ncp leader is achieved by leveraging nsxcli present on K8s cluster master VMs. 

bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master "sudo /var/vcap/jobs/ncp/bin/nsxcli -c get ncp-master status" | egrep -v 'subject|to|use' | grep "This instance is the NCP master"

master/31c0f1f6-2104-4479-a4e3-39ed63aadc5c: stdout | This instance is the NCP master