Metrics Memory Usage High alarm observed on SSP UI
SSP 5.0 and NSX 4.2.1
The memory alarm on the Metrics Postgres instance is primarily triggered by a high volume of metrics being ingested and processed. This typically indicates that the current memory allocation is insufficient to handle the load efficiently. The spike in memory usage is largely driven by the scale of metric reporters in the system, particularly those related to large-scale components such as LSPs, DFW, and IDPS. Each of these components contributes significantly to the overall metrics footprint, depending on their deployment size.
Note: This issue is fixed in SSP 5.1
STEP1: set the metrics replicas as below and verify all metrics pods are up by using below command in SSPI cli using root credentials
check all metrics pods are up and running by executing below command in SSPI cli and if all metrics pods are up set replicas as below
alias kn='k -n nsxi-platform'
kn get pods | grep metrics
kn scale deployment metrics-manager --replicas=2
kn scale deployment metrics-app-server --replicas=1
kn scale deployment metrics-query-server --replicas=1
STEP2: updated below values of metrics-postgresql-ha-pgpool using command
kn edit deployment/metrics-postgresql-ha-pgpool
initial values:
PGPOOL_NUM_INIT_CHILDREN=200
PGPOOL_MAX_POOL=2
Updated the values to
PGPOOL_NUM_INIT_CHILDREN=80
PGPOOL_MAX_POOL=1
This will restart metrics-postgresql-ha-pgpool, wait for both replicas to be running using the below command
kn get pods | grep metrics-postgresql-ha-pgpool
STEP3: Update memory-related configurations in the configmap using below command:
kn edit configmap metrics-postgresql-ha-postgresql-extended-configuration
Update the following parameters in the data section:
shared_buffers = 1500MB
work_mem: 8MB
STEP4: Update POSTGRESQL_MAX_CONNECTIONS
kn set env sts/metrics-postgresql-ha-postgresql POSTGRESQL_MAX_CONNECTIONS=180
This will restart metrics-postgresql-ha-postgresql-0, wait for the pod to be in a running state and verify All pods were up and running
kn get pods | grep metrics-postgresql-ha-postgresql-0
if still observes , kindly involve Broadcom Support for further troubleshooting .