Message: Following machines are reporting unknown etcd member status: <cluster-name-control-plane-A>,<cluster-name-control-plane-B>,<cluster-name-control-plane-C>
YYYY-MM-DDTHH:MM:SS. stderr F E0308 12:36:45.0070611 controller.go:326] "Reconciler error" err="failed to get etcdStatus for workload cluster <cluster-name>: failed to create etcd client: could not establish a connection to any etcd node: unable to create etcd client: context deadline exceeded" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" kubeadmControlPlane="<namespace/cluster-name>" namespace="<namespace-name>" name="<cluster-name>" reconcileID=<ID>
YYYY-MM-DDTHH:MM:SS. stderr F I0116 22:15:01.5345821 remediation.go:286] "etcd cluster projected after remediation of cluster-CP-VM-name" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" kubeadmControlPlane="namespace-name/cluster-name" namespace="namespace-name" name="cluster-name" reconcileID=...
svc
of the Guest cluster returned status 'OK'
which means the Control Plane nodes are able to communicate between each other.vSphere with Tanzu
The older version of CAPI has difficulty managing unresponsive guest clusters.
If any of the following controllers on the Supervisor are in an error state (Terminating
or Pending
), guest cluster reconciliation gets queued:
capi-controller-manager
capi-kubeadm-control-plane-controller-manager
capi-kubeadm-bootstrap-controller-manager
Example:
root@
[ ~ ]# etcdctl --cluster=true endpoint health -w table**************
+--------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+--------------------------+--------+------------+-------+
| https://
**************
:2379 | true | 4.671596ms | || https://
**************
:2379 | true | 7.120376ms | || https://
**************
:2379 | true | 7.356998ms | |+--------------------------+--------+------------+-------+
root@
[ ~ ]# etcdctl member list -w table**************
+------------------+---------+----------------------------------+--------------------------+--------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+----------------------------------+--------------------------+--------------------------+------------+
|
**************
| started | **************
| https://
:2380 | https://**************
:2379 | false |**************
|
**************
| started | **************
| https://
:2380 | https://**************
:2379 | false |**************
|
**************
| started | **************
| https://
:2380 | https://**************
:2379 | false |**************
+------------------+---------+----------------------------------+--------------------------+--------------------------+------------+
root@
[ ~ ]# etcdctl --cluster=true endpoint status -w table**************
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://
:2379 | **************
| 3.5.11 | 176 MB | false | false | 83 | 584410086 | 584410086 | |**************
| https://
:2379 | **************
| 3.5.11 | 176 MB | true | false | 83 | 584410086 | 584410086 | |**************
| https://
:2379 | **************
| 3.5.11 | 176 MB | false | false | 83 | 584410086 | 584410086 | |**************
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
kubectl get svc -n <namespace>
Eg:
[ ~ ]# kubectl get svc -n namespace01
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tkc-xsmall-control-plane-service LoadBalancer ##.##.#.## ###.###.#.### 6443/TCP 44d
Check the health of the Cluster External IP:
curl -k https://<EXTERNAL-IP obtained from above step>:6443/healthz
e.g. output:
root@<Supervisor CP node name>[ ~ ]# curl -k https://<EXTERNAL-IP>:6443/healthz
ok
Restart the CAPI controllers by scaling down and up the replicas of the pods.
Note: Before initiating a restart, ensure that the status of all controllers is verified. If any controller is in an error state, it is essential to collect the WCP logs before proceeding with the restart.kubectl get pods -A | grep -i capi
kubectl get deployment -n vmware-system-capw | grep capi
kubectl scale deployment -n vmware-system-capw --replicas=0 capi-controller-manager
kubectl scale deployment -n vmware-system-capw --replicas=2 capi-controller-manager
kubectl scale deployment -n vmware-system-capw --replicas=0 capi-kubeadm-bootstrap-controller-manager
kubectl scale deployment -n vmware-system-capw --replicas=2 capi-kubeadm-bootstrap-controller-manager
kubectl scale deployment -n vmware-system-capw --replicas=0 capi-kubeadm-control-plane-controller-manager
kubectl scale deployment -n vmware-system-capw --replicas=2 capi-kubeadm-control-plane-controller-manager
Note: Make sure the Guest cluster etcd and API server/load balancer are ok before we attempt the above step.