update-node-password-runner" resulting in continuous pod churnosConfiguration.password.renewalDaysBeforeExpiry , which is available with ClusterClass version builtin-generic-v3.4.0 and later.vmware-system-tkg-controller-manager pod logs under /var/log/pods/svc-tkg-domain-c#_vmware-system-tkg-controller-manager-##########/manager show below similar entries
YYYY-MM-DDTHH:MM:SS stderr F I0209 HH:MM:SS 1 password_controller.go:###] "Cluster needs reconciliation" logger="svc-tkg-domain-c#-tkg-controller.password-controller" name="<tkg-cluster-name>" namespace="<tkg-ns-name>" lastUpdateTimestamp="YYYY-MM-DD HH:MM:SS +0000 UTC" passwordAge="<#h#m#s>" renewalDaysBeforeExpiry="<#h#m#s>"
YYYY-MM-DDTHH:MM:SS stderr F E0209 HH:MM:SS 1 controller.go:347] "Reconciler error" err="admission webhook \"capi.validating.tanzukubernetescluster.run.tanzu.vmware.com\" denied the request: vm class(es): "<tkg-vmclass-name>" not found" controller="password-controller" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="<tkg-ns-name>/<tkg-cluster-name>" namespace="<tkg-ns-name>" name="<tkg-cluster-name>" reconcileID="########-####-####-####-############"
YYYY-MM-DDTHH:MM:SS stderr F I0209 HH:MM:SS 1 password_controller.go:###] "Cluster needs reconciliation" logger="svc-tkg-domain-c#-tkg-controller.password-controller" name="<tkg-cluster-name>" namespace="<tkg-ns-name>" lastUpdateTimestamp="YYYY-MM-DD HH:MM:SS +0000 UTC" passwordAge="<#h#m#s>" renewalDaysBeforeExpiry="<#h#m#s>"
YYYY-MM-DDTHH:MM:SS stderr F I0209 HH:MM:SS 1 password_controller.go:###] "Creating update runner in guest cluster" logger="svc-tkg-domain-c#-tkg-controller.password-controller" name="<tkg-cluster-name>" namespace="<tkg-ns-name>"
YYYY-MM-DDTHH:MM:SS stderr F I0209 HH:MM:SS 1 password_controller.go:###] "Guest cluster update runner created" logger="svc-tkg-domain-c#-tkg-controller.password-controller" name="<tkg-cluster-name>" namespace="<tkg-ns-name>"
vSphere with Tanzu
vSphere Kubernetes Service[VKS] 3.4
Since VKS 3.4, the daemonset "update-node-password-runner" is only triggered when the cluster password is expired and is expected to rotate the cluster SSH password and update cluster annotation lastPasswordtimeStamp. The specific annotation will be set by VKS after the password rotation is done and when it's set, the password rotation is considered as succeeded and won't retry the password update.
Failure of the validating admission webhook (often caused by environmental issues like a missing VM Class) would potentially block any further update on the cluster due to which the controller never gets the chance to add the annotation to the cluster spec, which causes the password rotation to treat it as update failure and retrigger the password rotation daemonset.
Add the missing VM class to the namespace Manage VM Classes on a Namespace in vSphere with Tanzu and ensure the VM class that is defined in cluster YAML is mapped to the cluster namespace.
Note:
If the VM class is reattached to the namespace without making any changes to the Cluster object, no rollout will be triggered.
However, if a new VM class is required, it must first be associated with the namespace and then updated in the Cluster object. Updating the Cluster object with the new VM class will trigger a rollout.
VKS will kick-off the update when expiry (90 days) - renewalDaysBeforeExpiry (7d) = 83 days by default to prevent the user from locking out of the cluster node.
This is enabled by default and renewalDaysBeforeExpiry defaults to 7. (unless renewalDaysBeforeExpiry is set to 0 to disable this feature entirely) using using osConfiguration.password.renewalDaysBeforeExpiry
For example: If user sets renewalDaysBeforeExpiry set to 14 using osConfiguration.password.renewalDaysBeforeExpiry, the password rotation will occur after 76 days.