Aria Automation UI fails to load and upon checking the Kubernetes node or service status returns error "The connection to the server vra-k8s.local:6443 was refused - did you specify the right host or port?"
Kubelet service fails to come online (Active) after the node reboot or K8s reinitialize Status: "exit status is 255"
Running journalctl -xeu kubelet contains entries similar to
Aug 14 01:25:40 applianceFQDN.vmware.com kubelet[5669]: F0126 01:25:40.942105 5669 server.go:266] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Aug 14 01:25:40 applianceFQDN.vmware.com kubelet[5669]: E0126 01:25:40.941998 5669 bootstrap.go:264] Part of the existing bootstrap client certificate is expired
VMware Aria Automation 8.x
VMware Aria Automation Orchestrator 8.x
This issue is caused by the kubelet service certificate expiring after one year.
8.0 - 8.8: # /opt/scripts/recover_etcd.sh --confirm /root/backup-12345
8.12 and newer: # vracli etcd restore --local --confirm /root/backup-123456789.db; systemctl start etcd
kubectl get vaconfig -o yaml | tee > /root/vaconfig.yaml
vracli cluster leave
kubectl apply -f /root/vaconfig.yaml --force
/opt/scripts/deploy.sh
vracli cluster leave
8.0 - 8.8: # /opt/scripts/recover_etcd.sh --confirm /root/backup-12345
8.12 and newer: # vracli etcd restore --local --confirm /root/backup-123456789.db; systemctl start etcd
kubectl get vaconfig -o yaml | tee > /root/vaconfig.yaml
vracli cluster leave
kubectl apply -f /root/vaconfig.yaml --force
vracli cluster join [primary-node] --preservedata
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/
https://github.com/kubernetes/kubeadm/issues/1753