Kubelet post-start script fails repeatedly with bosh recreate. Node shows "NoRouteCreated" and "Node created without a route"
search cancel

Kubelet post-start script fails repeatedly with bosh recreate. Node shows "NoRouteCreated" and "Node created without a route"

book

Article ID: 298646

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Scenario

While performing activity on a worker node, such as a bosh activity like "bosh recreate", you see the following symptoms:

The bosh task output shows kubelet post-start script failed during startup:
... result: 1 of 5 post-start scripts failed. Failed Jobs: kubelet ...

Looking at kubelet post-start.stdout.log you only see:
kubelet failed post-start checks after 120 seconds

Looking at the output of monit summary from the worker node, you may see it in state as "Not Monitored".

Your kubernetes worker node in question shows STATUS of Ready.
$ kubectl get nodes
NAME                                      STATUS   ROLES    AGE    VERSION
.
.
.
vm-748fec15-8286-4b97-63e8-1e5217e3156e   Ready    <none>   3d1h   v1.14.10

Looking at the details of the worker node you see the issue:
$ kubectl describe node vm-748fec15-8286-4b97-63e8-1e5217e3156e
.
.
.
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   True    Mon, 01 Jan 0001 00:00:00 +0000   Fri, 08 Jan 2021 17:48:15 +0000   NoRouteCreated               Node created without a route

If you run the kubelet post-start script manually, the kubelet comes online and the issue appears to be resolved. Once ssh'd into the problematic worker:
$ sudo -i
# cd /var/vcap/jobs/kubelet/bin
# ./post-start
# exit
$ exit

$ kubectl describe node vm-748fec15-8286-4b97-63e8-1e5217e3156e
.
.
.
Conditions:

<no longer shows the error>
However, retrying your VM activity, such as "bosh recreate", you run into the same scenario again.

Environment

Product Version: 1.8

Resolution

First check if you can drain the node normally:
kubectl drain vm-748fec15-8286-4b97-63e8-1e5217e3156e

or with the --force option:
kubectl drain vm-748fec15-8286-4b97-63e8-1e5217e3156e --force

But it is possible that you may have pods with local storage. If so you will have to run the following (as well as ignoring daemonsets):
kubectl drain vm-748fec15-8286-4b97-63e8-1e5217e3156e --delete-emptydir-data --ignore-daemonsets --force

Then retry your VM activity (such as bosh recreate, etc):
vm-748fec15-8286-4b97-63e8-1e5217e3156e --delete-emptydir-data --ignore-daemonsets --force