Prometheus and alertManager-prometheus pods stuck in an infinite terminating loop when deploying to Rancher

search cancel

Prometheus and alertManager-prometheus pods stuck in an infinite terminating loop when deploying to Rancher

book

Article ID: 315873

calendar_today

Updated On:

Products

VMware Telco Cloud Service Assurance

Issue/Introduction

After successful upgrade, seeing alertManager-prometheus pods stuck in an infinite termination loop.

Example:
# kubectl get tcxproduct

NAME STATUS READY MESSAGE AGEtcsa update. Completed True All App CRs reconciled successfully 2d14h

Prometheus errors :

prometheus-prometheus-kube-prometheus-prometheus-0 and alertmanager-prometheus-kube-prometheus-alertmanager-0 pods are in terminating loop.

Rancher Error:

alertmanager-prometheus-kube-prometheus-alertmanager-tls-assets not found.

The rancher GUI displays the pod in a creating state.

Environment

1.4.x, 2.x, 3.x

Cause

TCSA deployment on Rancher is not officially supported.

There were two operators (rancher-monitoring-operator and Prometheus-operator from the TCSA installation package) trying to reconcile the same resources (AlertManager and Prometheus pods).

Resolution

Stop the rancher-monitoring-operator by scaling its deployment to 0.

AlertManager and Prometheus pods are created and started successfully.

Feedback

thumb_up Yes

thumb_down No