Enterprise PKS cluster creation fails with error "1 of 5 post-start scripts failed. Failed Jobs: kubelet"

search cancel

Enterprise PKS cluster creation fails with error "1 of 5 post-start scripts failed. Failed Jobs: kubelet"

book

Article ID: 345570

calendar_today

Updated On:

Products

VMware Cloud PKS

Issue/Introduction

Symptoms:

Enterprise PKS cluster creation fails with the error similar to:
1 of 5 post-start scripts failed. Failed Jobs: kubelet
Master and Worker nodes creation completes and they are in a running state.
Bosh task for cluster creation returns an error similar to:
Task 2060 | 07:02:16 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 2060 | 07:02:16 | Creating missing vms: master/25edd43e-c566-41c8-930e-b8f0e010bf3e (0)
Task 2060 | 07:02:16 | Creating missing vms: worker/4e598359-a5bc-4544-bfd9-ff6071658522 (0)
Task 2060 | 07:04:45 | Creating missing vms: master/25edd43e-c566-41c8-930e-b8f0e010bf3e (0) (00:02:29)
Task 2060 | 07:04:58 | Creating missing vms: worker/4e598359-a5bc-4544-bfd9-ff6071658522 (0) (00:02:42)
Task 2060 | 07:04:58 | Updating instance master: master/25edd43e-c566-41c8-930e-b8f0e010bf3e (0) (canary) (00:06:14)
Task 2060 | 07:11:12 | Updating instance worker: worker/4e598359-a5bc-4544-bfd9-ff6071658522 (0) (canary) (00:35:32)
L Error: Action Failed get_task: Task eb75b17c-0027-42bc-44ab-358733b8d976 result: 1 of 5 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns, telemetry-agent-image, wavefront-proxy-images, sink-resources-images.
Task 2060 | 07:46:44 | Error: Action Failed get_task: Task eb75b17c-0027-42bc-44ab-358733b8d976 result: 1 of 5 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns, telemetry-agent-image, wavefront-proxy-images, sink-resources-images.
In the /var/vcap/sys/log/kubelet/kubelet.stderr.log file on the worker nodes, you see the entries similar to:
replacing cloudprovider-reported hostname of fe96ca33-1985-4545-aa06-7b9d90aa5925 with overridden hostname of 30.0.4.6
Failed to patch IP as MAC address 02:42:29:34:95:46 does not belong to a VMware platform

Environment

VMware PKS 1.x

Cause

This issue occurs due to the ephemeral disk being full before the addons errand is run.

Resolution

This is a known issue affecting Enterprise PKS.

Workaround:

To resolve this issue, increase the ephemeral disk size by changing the Errand vm_type for the workers.

Log in to Ops Manager user interface.
Navigate to the appropriate plan on the PKS tile and increase the ephemeral disk size by changing the Errand vm_type for the workers.
Apply the changes from Ops Manager.
Retry creating a new PKS cluster.

Feedback

thumb_up Yes

thumb_down No