I0918 14:32:49.364540 1 machineset_controller.go:476] "Created machine 1 of 1 with name \"wkld-md-1-xxxxx-c695b5dxx-p9999\"" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" machineSet="tkg-system/wkld-md-1-xxxxx-c695b5dxx" namespace="tkg-system" name="wkld-md-1-xxxxx-c695b5dxx" reconcileID=9be7c4b0-60b5-4264-9397-5542a98xxxxx
I0918 14:33:55.360129 1 machinehealthcheck_controller.go:431] "Target has failed health check, marking for remediation" controller="machinehealthcheck" controllerGroup="cluster.x-k8s.io" controllerKind="MachineHealthCheck" machineHealthCheck="tkg-system/wkld-md-1-xxxxx" namespace="tkg-system" name="wkld-md-1-xxxxx" reconcileID=9da00031-982c-46fc-9f5b-f676c47xxxxx cluster="wkld" target="tkg-system/wkld-md-1-xxxxx/wkld-md-1-xxxxx-c695b5dxx-p9999/wkld-md-1-xxxxx-c695b5dxx-p9999" reason="UnhealthyNode" message="Condition Ready on node is reporting status False for more than 12m0s"
Steps to recover from this issue.
1.) Pause the cluster reconciliation to stop the loop of continuously deleting and recreating the nodes and to avoid IP address exhaustion:
kubectl patch cluster <cluster-name> --type merge -p '{"spec":{"paused": true}}'
2.) Set up Time Sync across the ESXi hosts. Confirm that time has been synchronized afterwards.
3.) Make sure that DHCP has available IP addresses. If needed, release the unassigned IP address leases accordingly.
4.) Resume the cluster reconciliation.
kubectl patch cluster <cluster-name> --type merge -p '{"spec":{"paused": false}}'