Metrics service is down, causing Metrics Delivery Failure errors on all hosts.

Products

VMware vDefend Firewall

Issue/Introduction

NAPP is degraded and Metrics is down in NSX UI.
You will see these pods in not running state (0/1).

napp-k get pods | grep 0/1

metrics-db-helper-76f75ffb96-4wcnx 0/1 Running 2442 (6m13s ago) 74dmetrics-manager-7bfb69fd77-784dv 0/1 Running 2533 (4m45s ago) 74dmetrics-manager-7bfb69fd77-zdrr8 0/1 Running 2561 (5m13s ago) 74d

metrcs-manager pod logs has this error.

napp-k logs metrics-manager-7bfb69fd77-784dv

+ psql -a postgresql://metrics-postgresql-ha-pgpool:5432/metrics -U postgres -f /opt/vmware/nsx/pace/scripts/schema-upgrade.sql
psql: error: connection to server at "metrics-postgresql-ha-pgpool" (10.102.##.##), port 5432 failed: ERROR: unable to read message kind
DETAIL: kind does not match between main(45) slot[1] (53)

+ psql -a postgresql://metrics-postgresql-ha-pgpool:5432/metrics -U postgres -f /opt/vmware/nsx/pace/scripts/schema-upgrade.sql
psql: error: FATAL: database "metrics" does not exist

pgpool pod logs has this error.

napp-k logs metrics-postgresql-ha-pgpool-674b8fd666-nr57s

2025-03-03 20:19:39: pid 159: LOG: pool_read_kind: error message from main backend:database "metrics" does not exist
2025-03-03 20:19:39: pid 159: ERROR: unable to read message kind
2025-03-03 20:19:39: pid 159: DETAIL: kind does not match between main(45) slot[1] (53)

Environment

NAPP 4.2.0.1

Cause

The application is unable to find the metrics database.

Resolution

Execute following commands as root on the CLI of the NSX manager.

1. napp-k exec -it metrics-postgresql-ha-postgresql-0 bash
2. PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1
3. \l

The result of #3 should look something like the one below.
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+---------+-------+-----------------------
metrics | postgres | UTF8 | C | C |
postgres | postgres | UTF8 | C | C |
repmgr | postgres | UTF8 | C | C |
template0 | postgres | UTF8 | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C | C | =c/postgres +
| | | | | postgres=CTc/postgres

If you don't see the metrics database here,

Execute,

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=0

Followed by,

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=2

Wait for all the metrics-postgresql* pods to come up.

Execute #1, #2 and #3 again and see if you can see the metrics database.

If you still don't see the metrics database.

Execute,

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=3

Wait for all the metrics-postgresql* pods to come up.

Execute #1, #2 and #3 again and see if you can see the metrics database.

If you still don't see the metrics database.

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=2

Wait for all the metrics-postgresql* pods to come up.

Execute,

1. napp-k exec -it metrics-postgresql-ha-postgresql-0 bash
2. PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1
3. CREATE DATABASE metrics; 4. quit 5. exit

All the pods should come up after some time, To speed this up, delete the failed pods.

napp-k delete pod <pod-name>