While setting up Disaster Recovery (DR) monitoring in HUB using Prometheus Postgres Exporter, there could be a need to monitor replication replay lag from the primary PostgreSQL database. PostgreSQL provides detailed replication lag information through the following query on the primary node:
SELECT
application_name,
state,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
Although this query returns valid time-based lag values for each connected replica, these fields are not exposed as Prometheus metrics by default in postgres_exporter.
The exporter does provide several replication-related metrics such as:
pg_replication_lag_seconds
pg_stat_replication_pg_wal_lsn_diff
pg_replication_slot_*
pg_stat_replication_reply_time
However, the metric pg_replication_lag_seconds typically reports:
0 on the primary
Meaningful lag values only when scraped from replica (standby/DR) nodes
This could raise questions about:
Exporting custom queries to capture write_lag, flush_lag, and replay_lag
Calculating time-based lag on the primary using pg_stat_replication_reply_time
1. Recommended Metric for Primary-Side Monitoring
For replication monitoring from the primary database, the supported and recommended metric is:
pg_stat_replication_pg_wal_lsn_diff
This metric:
Is exposed on the primary node
Reports WAL lag per replica
Measures replication delay in bytes
Requires no custom exporter configuration
Example Output:
pg_stat_replication_pg_wal_lsn_diff{
application_name="<app_name>",
client_addr="<IP>,
slot_name="<name>",
state="streaming"
} 0
This provides clear visibility into DR replica health directly from the primary.
2. Time-Based Replication Lag Monitoring
Time-based lag (pg_replication_lag_seconds) is not exposed per replica on the primary by postgres_exporter.
To monitor replication lag in seconds, follow this approach:
Scrape Prometheus metrics from all PostgreSQL nodes
Identify replica nodes using:
pg_replication_is_replica == 1
Use pg_replication_lag_seconds from replica nodes, not the primary
3. Custom Query Considerations
postgres_exporter does not export custom SQL queries by default
Calculating lag using:
current_time - pg_stat_replication_reply_time
is not supported via built-in exporter metrics
Direct SQL queries can still be used for manual checks or diagnostics, but not for Prometheus-based monitoring unless a custom exporter configuration is implemented