Tanzu Mission Control Self-Managed (TMC-SM) won't start due to storage issues in underlying StatefulSets.
search cancel

Tanzu Mission Control Self-Managed (TMC-SM) won't start due to storage issues in underlying StatefulSets.

book

Article ID: 325581

calendar_today

Updated On:

Products

VMware VMware Cloud Director

Issue/Introduction

This KB is designed to help identify which Pod is experiencing the issue.

Once identified, use the steps in the following KB to resolve the issue:
https://kb.vmware.com/s/article/96425

Symptoms:
  • Tanzu Mission Control Self-Managed (TMC-SM) won't start.
  • TMC-SM was deployed using VMware Cloud Director Extension for VMware Tanzu Mission Control to a VMware Cloud Director Container Service Extension Kubernetes cluster.
  • TMC-SM fails to start due to storage issues in underlying StatefulSets.
  • PVCs run out of storage space.
  • Checking the Pod logs you see errors relating to disk usage.
kubectl logs <POD_NAME>

ts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open <device>: no space left on device"
  • Inspection of the Pod shows a full disk.
$ kubectl exec <POD_NAME> -- df -h
 
Filesystem      Size  Used Avail Use% Mounted on
overlay          19G   13G  5.3G  71% /
tmpfs            64M     0   64M   0% /dev
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/sda4        19G   13G  5.3G  71% /tmp
/dev/sdb         10G   10G    0M 100% /mnt
shm              64M     0   64M   0% /dev/shm
tmpfs            32G   12K   32G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            16G     0   16G   0% /proc/acpi
tmpfs            16G     0   16G   0% /proc/scsi
tmpfs            16G     0   16G   0% /sys/firmware


Resolution

There are 4 main Pods which can experience the issue, each of which have a different method by which you can review the current space usage.
  1. AlertManager
  2. Kafka
  3. Postgres
  4. Prometheus


AlertManager

$ kubectl exec alertmanager-tmc-local-monitoring-tmc-local-0 -c alertmanager -- df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdd        2.0G  2.0G    0M 100% /data



Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with tmc-local-monitoring

 

Kafka

$ kubectl exec kafka-controller-0 -c kafka -- df -h /bitnami/kafka
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc          5G    5G    0M 100% /bitnami/kafka



Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with kafka.

 

Postgres

$ kubectl exec postgres-postgresql-0 -c postgresql -- df -h /bitnami/postgresql
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        7.8G  7.8G    0M 100% /bitnami/postgresql



Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with postgres.
 

Prometheus

$ kubectl get pods
prometheus-server-tmc-local-monitoring-tmc-local-0   1/2     CrashLoopBackOff   1327 (3m52s ago)   13d



$ kubectl logs prometheus-server-tmc-local-monitoring-tmc-local-0 -c prometheus

ts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open /prometheus/wal/00000582: no space left on device"



Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with tmc-local-monitoring