PKS upgrade failed with error "2 of 5 post-start scripts failed. Failed Jobs: telemetry-agent-image"
search cancel

PKS upgrade failed with error "2 of 5 post-start scripts failed. Failed Jobs: telemetry-agent-image"

book

Article ID: 316814

calendar_today

Updated On:

Products

VMware Cloud PKS

Issue/Introduction

Symptoms:
  • PKS upgrade from 1.2.7 to 1.3.1 failed 
  • The upgrade is halted after upgrading two of the nodes in the cluster
  • You see the following Bosh task reports errors:
Task 13565 | 23:39:22 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 13565 | 23:39:25 | Updating instance worker: worker/4427fda7-d962-4ec5-a594-b3526c05394a (3) (canary) (00:02:34)31m
L Error: Action Failed get_task: Task 16e218ba-5984-4fd8-6633-52ea7a790bf7 result: 2 of 5 post-start scripts failed. Failed Jobs: telemetry-agent-image, kubelet. Successful Jobs: bosh-dns, sink-resources-images, wavefront-proxy-images.[0m
Task 13565 | 23:41:59 | [31mError: Action Failed get_task: Task 16e218ba-5984-4fd8-6633-52ea7a790bf7 result: 2 of 5 post-start scripts failed. Failed Jobs: telemetry-agent-image, kubelet. Successful Jobs: bosh-dns, sink-resources-images, wavefront-proxy-images.[0m
  • You see messages similar to the following in the post-start.stderr.log:
worker/4427fda7-d962-4ec5-a594-b3526c05394a: /var/vcap/sys/log/telemetry-agent-image/post-start.stderr.log

Error processing tar file(exit status 1): open /usr/share/man/man8/systemd-suspend.service.8.gz: no space left on device
Error processing tar file(exit status 1): open /usr/share/man/man1/watch.1.gz: no space left on device
Error processing tar file(exit status 1): symlink tset.1.gz /usr/share/man/man1/reset.1.gz: no space left on device
post-start.stderr.log (END)
  • You see an error similar to the following when running the docker-load command:
docker load -i /var/vcap/packages/telemetry-agent-image/pkstelemetrybot_telemetry-agent:fda6005.tar

8241afc74c6f: Loading layer [==================================================>]  120.8MB/120.8MB
Error processing tar file(exit status 1): open /usr/share/man/man8/systemd-remount-fs.service.8.gz: no space left on device

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware PKS 1.x

Cause

  • The local docker registry doesn't have enough free space
  • Docker image registry is filled with dangling images

Resolution

To resolve the issue, identify the dangling images in the registry and remove them

To identify the dangling images, please follow the instructions noted in How to identify and remove dangling registry images in docker registry (68151) .

Additional Information

How to identify and remove dangling registry images in docker registry (68151)