Liveness, Readiness, Tolerations and Node controller in TKGI

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Liveness, Readiness, Tolerations and Node controller in TKGI

Readiness and Liveness:
-These two mechanisms are in place in order to make sure that no requests are forwarded to your application before the Pod is up and running and no requests are forwarded to your application if your application is not reachable.
-If your application crashes the pod gets restarted in the configured time in liveness probe parameters. The reason for this is that kubelet running on the node does it’s job correctly based on the parameters set in the configuration.
-If Node gets gracefully shut down, this behaviour changes. It does change because kubelet on the node is not running anymore as it gets killed with the node, therefore there is no process left to look after the liveness and the readiness of the pods. The conclusion here is that without kubelet, liveness and readiness DO NOT work.

Tolerations:
-This is another mechanism. If a node gets marked as anything but “running” the scheduler restarts the pods on another node based on your tolerations configuration.
-kube-controller-manager detects the Node being not responsive by default after 5 seconds. However there is grace period of 30 sec by default for the node controller to report the Node down. This is designed in such way as if there is a network blip or CPU spike for few seconds and the node does not respond, premature pod eviction to be avoided.

Node controller:
-The node controller is responsible for updating the NodeReady condition of NodeStatus to ConditionUnknown when a node becomes unreachable (i.e. the node controller stops receiving heartbeats for some reason, e.g. due to the node being down), and then later evicting all the pods from the node (using graceful termination) if the node continues to be unreachable. (The default timeouts are 30s to start reporting ConditionUnknown and 5m after that to start evicting pods.) If you configure toleration for the application pod the 5 min value is overwritten by the toleration configured value. The node controller checks the state of each node every --node-monitor-period seconds.
More about it here: https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller

Environment

TKGI

Resolution

These values in TKGI are preconfigured and should not be changed. The configurations for tolerations and node eviction are very different and are not configured in the same way as native Kubernetes. Instead of specified in static pod yaml files with TKGI they are specified in configuration files for monit jobs.

In TKGI:
-If you gracefully shut down node VM from IaaS the pods on the node will show as running even after they stop responding and will stay like this until the 5 minutes default timer has expired. After this node eviction process will kick in for the pods and they will be restarted on another available node if any.
- If you configure liveness or readiness they will work for as long as kubelet on the node is running or the node is running. These mechanisms are there ONLY to monitor and control application status. If the node is down, these mechanisms will not work due to kubelet not being available to initiate them.
- If you configure 5 seconds toleration for the application it will take 35 seconds to restart the pod on another node. This is because the default --node-monitor-grace-period for pks is 30 seconds (could be 40 sec depending on version). When you add 5 seconds of toleration to 30 seconds grace period this equals 35 seconds total time.