After approximately one year of continuous operation, an SSPI/SSP deployment's management and workload Kubernetes clusters can stop functioning because internal certificates - including kubeadm control-plane certificates and kubelet server certificates - expire without being renewed.
Once expired, attempts to interact with either cluster from the SSPI appliance will fail with x509 certificate errors or "no route to host" errors:
sspi:~$ kubectl get node Error: x509: certificate has expired or is not yet valid sspi:~$ k get node Error: x509: certificate has expired or is not yet valid
This article explains how to recover from this state using the provided recovery script, and how to install the preventive script to avoid recurrence.
Earlier versions of SSPI/SSP did not proactively renew internal Kubernetes certificates for the management or workload clusters. After one year of continuous operation, both clusters may stop functioning due to expired certificates.
The recovery process has two phases:
To recover the setup, download the following script recover_ssp_with_expired_k8s_certs.sh to the HOME directory of SSPI root/sysadmin. Execute the script as root. The recovery script performs the following 3 steps to recover the setup to the point where the management cluster, VC connections and workload control plane are back in working state. To fully recover SSPI and SSP, we must also install the preventive script.