Metrics service is down, causing Metrics Delivery Failure errors on all hosts.
search cancel

Metrics service is down, causing Metrics Delivery Failure errors on all hosts.

book

Article ID: 389720

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

  • NAPP is degraded and Metrics is down in NSX UI.

  • You will see these pods in not running state (0/1).

napp-k get pods | grep 0/1

metrics-db-helper-76f75ffb96-4wcnx                               0/1     Running     2442 (6m13s ago)   74d
metrics-manager-7bfb69fd77-784dv                                 0/1     Running     2533 (4m45s ago)   74d
metrics-manager-7bfb69fd77-zdrr8                                 0/1     Running     2561 (5m13s ago)   74d

  • metrcs-manager pod logs has this error.

napp-k logs metrics-manager-7bfb69fd77-784dv

+ psql -a postgresql://metrics-postgresql-ha-pgpool:5432/metrics -U postgres -f /opt/vmware/nsx/pace/scripts/schema-upgrade.sql
psql: error: connection to server at "metrics-postgresql-ha-pgpool" (10.102.##.##), port 5432 failed: ERROR:  unable to read message kind
DETAIL:  kind does not match between main(45) slot[1] (53)

+ psql -a postgresql://metrics-postgresql-ha-pgpool:5432/metrics -U postgres -f /opt/vmware/nsx/pace/scripts/schema-upgrade.sql
psql: error: FATAL:  database "metrics" does not exist

  • pgpool pod logs has this error.

napp-k logs metrics-postgresql-ha-pgpool-674b8fd666-nr57s

2025-03-03 20:19:39: pid 159: LOG:  pool_read_kind: error message from main backend:database "metrics" does not exist
2025-03-03 20:19:39: pid 159: ERROR:  unable to read message kind
2025-03-03 20:19:39: pid 159: DETAIL:  kind does not match between main(45) slot[1] (53)

 

Environment

NAPP 4.2.0.1

Cause

The application is unable to find the metrics database.

Resolution

Execute following commands as root on the CLI of the NSX manager.

1. napp-k exec -it metrics-postgresql-ha-postgresql-0 bash
2. PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1
3. \l

The result of #3 should look something like the one below.
List of databases
   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges   
-----------+----------+----------+---------+-------+-----------------------
 metrics   | postgres | UTF8     | C       | C     | 
 postgres  | postgres | UTF8     | C       | C     | 
 repmgr    | postgres | UTF8     | C       | C     | 
 template0 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres
 template1 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres

If you don't see the metrics database here,

Execute,

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=0

Followed by,

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=2

Wait for all the metrics-postgresql* pods to come up.

Execute #1, #2 and #3 again and see if you can see the metrics database.

If you still don't see the metrics database.

Execute,

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=3

Wait for all the metrics-postgresql* pods to come up.

Execute #1, #2 and #3 again and see if you can see the metrics database.

If you still don't see the metrics database.

napp-k scale statefulsets metrics-postgresql-ha-postgresql --replicas=2

Wait for all the metrics-postgresql* pods to come up.

Execute,

1. napp-k exec -it metrics-postgresql-ha-postgresql-0 bash
2. PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1
3. CREATE DATABASE metrics;
4. quit
5. exit

All the pods should come up after some time, To speed this up, delete the failed pods.

napp-k delete pod <pod-name>