ScenarioWhile performing activity on a worker node, such as a bosh activity like "
bosh recreate", you see the following symptoms:
The bosh task output shows
kubelet post-start script failed during startup:
... result: 1 of 5 post-start scripts failed. Failed Jobs: kubelet ...
Looking at
kubelet post-start.stdout.log you only see:
kubelet failed post-start checks after 120 seconds
Looking at the output of
monit summary from the worker node, you may see it in state as "
Not Monitored".
Your kubernetes worker node in question shows
STATUS of
Ready.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
.
.
.
vm-748fec15-8286-4b97-63e8-1e5217e3156e Ready <none> 3d1h v1.14.10
Looking at the details of the worker node you see the issue:
$ kubectl describe node vm-748fec15-8286-4b97-63e8-1e5217e3156e
.
.
.
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable True Mon, 01 Jan 0001 00:00:00 +0000 Fri, 08 Jan 2021 17:48:15 +0000 NoRouteCreated Node created without a route
If you run the
kubelet post-start script manually, the
kubelet comes online and the issue appears to be resolved. Once ssh'd into the problematic worker:
$ sudo -i
# cd /var/vcap/jobs/kubelet/bin
# ./post-start
# exit
$ exit
$ kubectl describe node vm-748fec15-8286-4b97-63e8-1e5217e3156e
.
.
.
Conditions:
<no longer shows the error>
However, retrying your VM activity, such as "
bosh recreate", you run into the same scenario again.