[timestamp] stderr F [timestamp] machinehealthcheck_controller.go:434] "Target has failed health check, marking for remediation" controller="machinehealthcheck" controllerGroup="cluster.#-###.io" controllerKind="MachineHealthCheck" MachineHealthCheck="###-###/###-###-###-######-##-#####" namespace="###-###" name="###-###-###-######-##-#####" reconcileID=########-####-####-####-############ Cluster="###-###/###-###-###" target="###-###/###-###-###-######-##-#####/###-###-###-######-##-#####-###############-#####/###-###-###-######-##-#####-###############-#####" reason="UnhealthyNode" message="Condition Ready on node is reporting status Unknown for more than 5m0s"
kubectl logs -n <TKG_NAMESPACE> -l name=capi-kubeadm-control-plane-controller-manager -c manager
vSphere with Tanzu 8
vSphere with Tanzu utilizes machine health checks to automatically remediate Kubernetes nodes that are considered unhealthy. These checks include MemoryPressure, DiskPressure, PIDPressure and NetworkUnavailable. If any of the Kubernetes worker nodes experience any of these conditions are met for 5 minutes, they will be automatically be rebuilt/remediate. Reference Configure MachineHealthCheck for v1beta1 Clusters for more information.
While Kubernetes will take action to automatically remediate guest clusters, it's essential to perform proper monitoring / maintenance of both the vSphere and Kubernetes environments to avoid unnecessary remediations. Kubernetes node remediation consumes CPU, network and disk resources within the vSphere environment.
VMware Cloud Foundation Operations can be used to monitor both the vSphere environment and Kubernetes clusters.