Prometheus and alertManager-prometheus pods stuck in an infinite terminating loop when deploying to Rancher
search cancel

Prometheus and alertManager-prometheus pods stuck in an infinite terminating loop when deploying to Rancher

book

Article ID: 315873

calendar_today

Updated On:

Products

VMware Telco Cloud Service Assurance

Issue/Introduction

After successful upgrade, seeing alertManager-prometheus pods stuck in an infinite termination loop. 

Example:
# kubectl get tcxproduct

NAME STATUS READY    MESSAGE                        AGE
tcsa    update.   Completed  True All App CRs reconciled successfully  2d14h

Prometheus errors :

prometheus-prometheus-kube-prometheus-prometheus-0 and alertmanager-prometheus-kube-prometheus-alertmanager-0 pods are in terminating loop.

Rancher Error:

alertmanager-prometheus-kube-prometheus-alertmanager-tls-assets not found.

The rancher GUI displays the pod in a creating state.

Environment

1.4.x, 2.x, 3.x

Cause

TCSA deployment on Rancher is not officially supported.

There were two operators (rancher-monitoring-operator and Prometheus-operator from the TCSA installation package) trying to reconcile the same resources (AlertManager and Prometheus pods).  

Resolution

Stop the rancher-monitoring-operator by scaling its deployment to 0.

AlertManager and Prometheus pods are created and started successfully.