Tanzu Mission Control Self-Managed (TMC-SM) won't start due to storage issues in underlying StatefulSets.

search cancel

Tanzu Mission Control Self-Managed (TMC-SM) won't start due to storage issues in underlying StatefulSets.

book

Article ID: 325581

calendar_today

Updated On: 06-11-2024

Products

Tanzu Mission Control

Issue/Introduction

This KB is designed to help identify which Pod is experiencing the issue.

Once identified, use the steps in the following KB to resolve the issue:
https://knowledge.broadcom.com/external/article?legacyId=96425

Symptoms:

Tanzu Mission Control Self-Managed (TMC-SM) won't start.
TMC-SM was deployed using VMware Cloud Director Extension for VMware Tanzu Mission Control to a VMware Cloud Director Container Service Extension Kubernetes cluster.
TMC-SM fails to start due to storage issues in underlying StatefulSets.
PVCs run out of storage space.
Checking the Pod logs you see errors relating to disk usage.

kubectl logs <POD_NAME>

ts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open <device>: no space left on device"

Inspection of the Pod shows a full disk.

$ kubectl exec <POD_NAME> -- df -h

Filesystem Size Used Avail Use% Mounted on
overlay 19G 13G 5.3G 71% /
tmpfs 64M 0 64M 0% /dev
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda4 19G 13G 5.3G 71% /tmp
/dev/sdb 10G 10G 0M 100% /mnt
shm 64M 0 64M 0% /dev/shm
tmpfs 32G 12K 32G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 16G 0 16G 0% /proc/acpi
tmpfs 16G 0 16G 0% /proc/scsi
tmpfs 16G 0 16G 0% /sys/firmware

Resolution

There are 4 main Pods which can experience the issue, each of which have a different method by which you can review the current space usage.

AlertManager
Kafka
Postgres
Prometheus

AlertManager

$ kubectl exec alertmanager-tmc-local-monitoring-tmc-local-0 -c alertmanager -- df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/sdd 2.0G 2.0G 0M 100% /data

Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with tmc-local-monitoring.

Kafka

$ kubectl exec kafka-controller-0 -c kafka -- df -h /bitnami/kafka
Filesystem Size Used Avail Use% Mounted on
/dev/sdc 5G 5G 0M 100% /bitnami/kafka

Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with kafka.

Postgres

$ kubectl exec postgres-postgresql-0 -c postgresql -- df -h /bitnami/postgresql
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 7.8G 7.8G 0M 100% /bitnami/postgresql

Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with postgres.

Prometheus

$ kubectl get pods
prometheus-server-tmc-local-monitoring-tmc-local-0 1/2 CrashLoopBackOff 1327 (3m52s ago) 13d

$ kubectl logs prometheus-server-tmc-local-monitoring-tmc-local-0 -c prometheus

ts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open /prometheus/wal/00000582: no space left on device"

Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with tmc-local-monitoring.

Feedback

thumb_up Yes

thumb_down No