Prometheus server failing every 5 minutes
search cancel

Prometheus server failing every 5 minutes

book

Article ID: 423461

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

  • Prometheus server CPU Utilization is spiking up every 5 minutes,
  • Prometheus server might restart every 5 minutes, which corresponds to the collection cycle.

Environment

Aria Operations Management Pack for Kubernetes - Version 2.2

Cause

This is caused by the custom user defined Prometheus queries.
When running custom user defined Prometheus queries (under configuration files) which are complex/resource intensive, Prometheus can get overloaded leading to Prometheus Pod getting restarted. As a result, it is necessary to reduce the number of parallel threads making queries to Prometheus.

Resolution

Reduce the number of concurrent threads making calls to Prometheus in Kubernetes MP:

  1. SSH to Aria Operations Node where the Kubernetes adapter is running.
  2. Edit the properties file under directory "/usr/lib/vmware-vcops/user/plugins/inbound/KubernetesAdapter3/conf/kubernetes_adapter.properties" and reduce the value of the "THREAD_POOL_SIZE" to a conservative value of around 3.
  3. Once the file is saved, restart the adapter.