Tanzu Mission Control Self-Managed Prometheus-Server Pod CrashLoopBackOff - OOMKilled
search cancel

Tanzu Mission Control Self-Managed Prometheus-Server Pod CrashLoopBackOff - OOMKilled

book

Article ID: 409538

calendar_today

Updated On:

Products

VMware Tanzu Mission Control - SM

Issue/Introduction

In a cluster running Tanzu Mission Control Self-Managed (TMC SM) system pods, the prometheus-server pod is in CrashLoopBackOff:

The exact name of the prometheus-server pod will vary by environment.

kubectl get pods -n <tmc sm namespace> | grep prometheus

NAME                READY         STATUS
prometheus-server    1/2        CrashLoopBackOff

Performing a describe of the above prometheus-server pod will show that its prometheus container is failing in OOMKilled state:

kubectl describe pod -n <tmc sm namespace> <prometheus server pod name>

prometheus:
      State:    CrashLoopBackOff
      Reason: OOMKilled

Environment

Tanzu Mission Control Self-Managed (TMC SM) 1.4.2

Cause

The default Prometheus-server memory limit is set to 1Gi.

Prometheus-server may need more memory depending on your environmental needs and fail repeatedly on OOMKilled state as a result.

Resolution

The values YAML that was used in the initial installation of Tanzu Mission Control Self-Managed (TMC SM) has a dedicated parameter for the memory limits of the prometheus-server and can be updated accordingly.

Note: Changes made directly to pods and the managing kubernetes object for those pods (deployment, statefulset, daemonset, etc.) will be reverted automatically to defaults or values configured in the YAML used to install TMC SM.

  1. Update the values YAML used for the TMC SM install to change the prometheus memory limit value according to your environment's needs:
    prometheus:
        memoryLimit: <desired value in Gi>Gi

Additional Information

More configurable parameters can be found in the below TMC SM documentation:

Configuration Key Values for Installing Tanzu Mission Control Self-Managed