tca-monitor-operator pod in crashloopbackoff stage OOM
search cancel

tca-monitor-operator pod in crashloopbackoff stage OOM

book

Article ID: 387638

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

For workload clusters in TCA 3.2 , it is observed that tca-monitor-operator pod is in a crashloopbackoff due to OOM.
Use below command to check the pods status:

kubectl get pod -A | grep tca-monitor

On describing the pod below reason is seen:

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled

Environment

3.1.x
3.2

Cause

This is due to the large number and size of the secrets which take up all the allowed memory for this container and eventually OOM killed it.

Resolution

Resolved in TCA 3.3

The workaround is to pause the monitor package and edit the memory limit of the monitor deployment.
Follow the below steps :

  1. In the workload cluster context, run the below command to stop tca-monitor-operator package being reconciled, to prevent reverting any changes made to the tca-monitor-operator deployment:
    kubectl patch pkgi -n tca-system tca-monitor-operator --type merge -p '{"spec": {"paused": true}}' 
  2. After pausing the package, run the "kubectl edit" command to edit the tca-monitor-operator deployment and set the memory limit:
    kubectl edit deploy -n tca-system tca-monitor-operator

Additional Information

About paused state, the monitoring operation performed by the monitor-operator won't be impacted by this state. The only required action is when TCA 3.3 or future releases of TCA is released, and users want to update the workload cluster to the versions supported in new TCA versions, then before updating the workload clusters, user should unpause this package so the package will be updated together.