SSP: Metrics Memory Usage High alarm  observed on SSP UI due to metrics services scale out
search cancel

SSP: Metrics Memory Usage High alarm  observed on SSP UI due to metrics services scale out

book

Article ID: 401844

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Metrics Memory Usage High alarm  observed on SSP UI 

Environment

SSP 5.0 and NSX 4.2.1

Cause

The memory alarm on the Metrics Postgres instance is primarily triggered by a high volume of metrics being ingested and processed. This typically indicates that the current memory allocation is insufficient to handle the load efficiently. The spike in memory usage is largely driven by the scale of metric reporters in the system, particularly those related to large-scale components such as LSPs, DFW, and IDPS. Each of these components contributes significantly to the overall metrics footprint, depending on their deployment size.

Resolution

Note: This issue is fixed in SSP 5.1 

STEP1:  set the metrics replicas as below and verify all  metrics pods are up  by using below command in SSPI  cli using root credentials 

check all metrics pods are up and running by executing below command in SSPI cli and if all  metrics pods are up set replicas as below  

alias kn='k -n nsxi-platform'

 kn get pods | grep metrics  

kn scale deployment metrics-manager --replicas=2
kn scale deployment metrics-app-server --replicas=1
kn scale deployment metrics-query-server --replicas=1


 STEP2: updated below values of metrics-postgresql-ha-pgpool using command 

 kn edit  deployment/metrics-postgresql-ha-pgpool


initial values:
PGPOOL_NUM_INIT_CHILDREN=200
PGPOOL_MAX_POOL=2
Updated the values to
PGPOOL_NUM_INIT_CHILDREN=80
PGPOOL_MAX_POOL=1


This will restart metrics-postgresql-ha-pgpool, wait for both replicas to be running using the below command

kn get pods | grep metrics-postgresql-ha-pgpool  


STEP3:  Update memory-related configurations in the configmap using below command:


kn edit configmap metrics-postgresql-ha-postgresql-extended-configuration


Update the following parameters in the data section:

    shared_buffers = 1500MB

    work_mem: 8MB


STEP4: Update POSTGRESQL_MAX_CONNECTIONS


kn set env sts/metrics-postgresql-ha-postgresql POSTGRESQL_MAX_CONNECTIONS=180


This will restart metrics-postgresql-ha-postgresql-0, wait for the pod to be in a running state and verify All pods were up and running

kn get pods | grep metrics-postgresql-ha-postgresql-0


Additional Information

if still observes , kindly involve Broadcom Support  for further troubleshooting .