ESXi continuously attempt to rotate spherelet certificates and reports " Failed to auto-rotate the certificates for Spherelet"
Spherelet began to get "Unauthorized" responses repeatedly when trying to rotate its certificates
/var/log/vmware/wcp/wcpsvc.log from vCenter
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
/var/log/pods/cert-manager-controller from supervisor
YYYY-DD-MMTHH:MM:SS stderr F E0101 15:42:04.118425 1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-controller: Unauthorized
YYYY-DD-MMTHH:MM:SS stderr F E0101 15:42:38.116808 1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-controller: Unauthorized
YYYY-DD-MMTHH:MM:SS stderr F E0101 15:43:12.116062 1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-controller: Unauthorized
/var/log/spherelet.log from ESXi
YYYY-DD-MMTHH:MM:SS No(##) spherelet[#######]: E1104 07:30:50.040029 ####### certificate_manager.go:###] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized
YYYY-DD-MMTHH:MM:SS No(##) spherelet[#######]: E1104 07:30:50.040114 ####### certificate_manager.go:###] kubernetes.io/kube-apiserver-client-kubelet: Reached backoff limit, still unable to rotate certs: timed out waiting for the condition
Expired client.crt & server.crt (bootstrap certificates). However, the kubelet-client-current.pem shows valid
Example:
Bootstrap certificate:
# openssl x509 -text -in /etc/vmware/spherelet/client.crt | grep Not
Not Before: Month date HH:MM:SS 2024 GMT
Not After : Month date HH:MM:SS 2025 GMT
# openssl x509 -text -in /etc/vmware/spherelet/spherelet.crt | grep Not
Not Before: Month date HH:MM:SS 2024 GMT
Not After : Month date HH:MM:SS 2025 GMT
Current certificate:
# openssl x509 -text -in /etc/vmware/spherelet/kubelet-client-current.pem | grep Not
Not Before: Month date HH:MM:SS 2025 GMT
Not After : Month date HH:MM:SS 2026 GMT
Note this problem can only occurs on a host that does not undergo some modification or interruption during the 1.5 year period.
Any of the below actions (that result in a supervisor restart of Spherelet) will reset the timeline to Day 0 as the bootstrap certificates will be refreshed:
vSphere with Tanzu
VMware ESXi
VMware vCenter Server 8.0 U3
The UNAUTHORIZED occurred due to an incorrect usage of the expired client bootstrap certificate during the auto rotation effort.
Spherelet is using client config with expired bootstrap certs instead of current cert resulting in unauthorised errors
# cat /etc/vmware/spherelet/spherelet.conf | grep client-certificate
client-certificate: /etc/vmware/spherelet/client.crt
The problem occurs once the bootstrap client certificate issued to Spherelet expires. Once it has expired Spherelet would no longer able to properly rotate its client certificate and thus would eventually be consuming an expired current client certificate.
VMware by Broadcom is aware of the issue and the fix will be included in future version.
Workaround:
Manually rotate certs using certmgr tool from KB Replace vSphere with Tanzu / vSphere Kubernetes Service Supervisor Certificates
certmgr certificates rotate --spherelet-only
Incase of any issues during manual certificate rotation, please reach out Broadcom Support Team for further assistance.
Key Considerations
Expired Certificates: If the current Spherelet certificate expires, the ESXi host will appear as "Not Ready" in vCenter, and the "Host Config" will be stuck in "Configuring".
Time Sensitivity: Spherelet certificates are highly sensitive to time differences. If the ESXi host's time is not synced (NTP) with the vCenter/Supervisor Cluster, the certificates may not be valid.