Tanzu Hub upgrade fails on Kubelet job due to image pull failures caused by insufficient disk capacity
search cancel

Tanzu Hub upgrade fails on Kubelet job due to image pull failures caused by insufficient disk capacity

book

Article ID: 438476

calendar_today

Updated On:

Products

VMware Tanzu Platform - Hub

Issue/Introduction

  • Upgrading Tanzu Hub from 10.3.5 to 10.4, users encounter failures in the Apply Changes operation on Operations Manager.
  • The error reported in Opsman for the apply changes operation occurs on the Kafka VM (this might occur on other VMs also), during the post-start operation.
  • Error reported looks like:

    Error: Action Failed get_task: Task ########-####-####-####-############ result: 1 of 4 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns, load-images, load-antrea-images.

  • From an SSH into the Tanzu Hub Registry VM, running kubectl get pods -A | grep -v Running you will see the coredns and metrics-server pods in the kube-system namespace in ImagePullBackOff status.
  • Running a kubectl get events -n kube-system will show events related to the kubelet component noting "invalid capacity 0 on image filesystem" with reason "InvalidDiskCapacity

 

Environment

This was observed in Tanzu Hub upgrades from 10.3.5 to 10.4, however, it is not version dependent.

Cause

In large environments, the image filesystem may not be sufficient and may require an increase on certain VMs. The Kafka VMs were the ones limited in disk space in the issue reported above. The image filesystem is stored on the Persistent Disk.

Resolution

In the Opsman GUI, under the Tanzu Hub tile -> Resource Config, increase the Persistent Disk size for the problem VM, then Apply Changes. This should allow the upgrade to complete.