Symptoms:
"The connection to the server vra-k8s.local:6443 was refused - did you specify the right host or port?"
/health error; no leader (status code 503)
error: You must be logged in to the server (Unauthorized)
Unable to connect to the server: x509: certificate has expired or is not yet valid
Jan 26 11:46:40 applianceFQDN.vmware.com kubelet[5669]: F0126 11:46:40.942105 5669 server.go:266] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jan 26 11:46:40 applianceFQDN.vmware.com kubelet[5669]: E0126 11:46:40.941998 5669 bootstrap.go:264] Part of the existing bootstrap client certificate is expired: 2021-01-16 17:13:45 +0000 UTC
Status: "exit status is 255"
Aug 14 01:25:40 applianceFQDN.vmware.com kubelet[5669]: F0126 01:25:40.942105 5669 server.go:266] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Aug 14 01:25:40 applianceFQDN.vmware.com kubelet[5669]: E0126 01:25:40.941998 5669 bootstrap.go:264] Part of the existing bootstrap client certificate is expired
The issue has at least two distinct causes:
This the kubelet certificate rollover is automatically handled in vRealize Automation & vRealize Orchestrator 8.2 and above.
For etcd corruption issues on a single node or cluster, these instructions are still valid and the same:
kubectl get vaconfig -o yaml | tee > /root/vaconfig.yaml
vracli cluster leave
kubectl apply -f /root/vaconfig.yaml --force
/opt/scripts/deploy.sh
vracli cluster leave
/opt/scripts/recover_etcd.sh --confirm /root/backup-12345
vracli etcd restore --local --confirm /root/backup-123456789.db; systemctl start etcd
kubectl get vaconfig -o yaml | tee > /root/vaconfig.yaml
vracli cluster leave
kubectl apply -f /root/vaconfig.yaml --force
vracli cluster join [primary-node] --preservedata
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/
https://github.com/kubernetes/kubeadm/issues/1753
Note: