ESXi nodes become NotReady after rotating Supervisor Certificates using certmgr
search cancel

ESXi nodes become NotReady after rotating Supervisor Certificates using certmgr

book

Article ID: 387476

calendar_today

Updated On:

Products

VMware vSphere with Tanzu vSphere with Tanzu

Issue/Introduction

After successful run of '/root/certmgr certificates rotate'  to rotate Supervisor cluster's certificates, all Supervisor worker nodes (ie. ESXi hosts) become 'NotReady'.

# k get node
NAME                              STATUS   ROLES                
4219d886d7ffc047f6a429be5babcdef  Ready    control-plane,master  
esxihost0 NotReady agent                
esxihost1 NotReady agent                
esxihost2 NotReady agent

However, once certain amount of time has passed they all become 'Ready' without having any corrective actions taken.

Environment

vSphere with Tanzu

Cause

ESXi host's spherelet.log:

time="2025-01-10T09:11:02Z" level=error msg="Failed to retrieve node" error=Unauthorized

Supervisor kube-apiserver.log:

2025-01-10T09:11:02Z stderr F E0110 09:11:02.902161 1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2025-01-10T09:11:02Z is before 2025-01-10T16:59:32Z, verifying certificate SN=7, SKID=, AKID= failed: x509: certificate has expired or is not yet valid: current time 2025-01-10T09:11:02Z is before 2025-01-10T16:59:32Z]"

The spherelet.log and kube-apiserver.log above from the cluster suggests the spherelet certificate generated on the host is not yet valid, it will become valid after 2025-01-10T16:59:32Z which makes sense why host node became 'Ready' after 2025-01-10T16:59:32Z.

Resolution

  • Synchronize the time across the environment, ie. all ESXi hosts, vCenter, Supervisor CPVMs.
    Highly recommended to implement a time synchronization solution like NTP. 

  • In another scenario that time is synchronized in the whole environment, but vCenter Server timezone is not UTC (ie. UTC+8). 
    Renew Supervisor certificate with wcp_cert_manager tool from Replace vSphere with Tanzu Supervisor Certificates that will cause the spherelet certificate in ESXi host will be renewed with invalid beginning time which is vCenter Server timezone hours ahead. For example:

    • vCenter Server time is April 9 09:19:32 CST (UTC+8) when run the wcp_cert_manager tool in vCenter Server.
    • spherelet certificate in ESXi will be renewed with invalid beginning time April 09:19:32 GMT which should be April 9 01:19:32 GMT
    • this will cause spherelet certificate invalid and can not authenticate kube-apiserver of Supervisor.

    To workaround the issue in this scenario, change the timezone of vCenter Server to UTC before run wcp_cert_manager tool in vCenter Server. More information, see Configure the System Time Zone and Time Synchronization Settings