You are looking for a better monitoring of the Checkpoints with APMIA postgres extension.
Currently there are three metrics related to Checkpoints monitoring.
Checkpoints:Checkpoints Requested
Checkpoints:Checkpoints Scheduled
Checkpoints:Time Spent Where Files Are Written To Disk (ms)
Full matric path is:
SuperDomain|<hostname>|Infrastructure|Agent|Postgres Databases|<database server IP/hostname>>|postgres|....
It seems like metric Checkpoints:Time Spent Where Files Are Written To Disk (ms) are reporting cumulative value and not much use for proper monitoring of Checkpoint if there is an issue.
Your requirement is to have some type of value where you can set a threshold and get an alert if there is an issue with a Checkpoint during certain time window.
The APMIA Infrastructure Agent Engineering/DEV team has reviewed this Enhancement Request.
They are planning to have the following funcationality added to the APMIA Infrastructure Agent PostgreSQL Extension.
Once Enhancement Request is complete, this funcationality will be available with out of the box agent Extension.
Implementing the following:
Checkpoint Write Ratio
checkpoint_write_ratio = (checkpoints_req / (checkpoints_timed + checkpoints_req)) * 100
checkpoints_req: Checkpoints forced by reaching max_wal_size or checkpoint_segments (emergency/forced checkpoints)
checkpoints_timed: Checkpoints that happened on schedule due to checkpoint_timeout
All checkpoints are happening on schedule (checkpoints_timed)
No emergency/forced checkpoints (checkpoints_req = 0)
Your WAL (Write-Ahead Log) configuration is properly sized
The system isn't being overwhelmed with writes
Values to be considered for alerting ate as follows (this is on suggestion and could be modified according to the environment and monitoring needs):
0-10%: Excellent (your case)
10-20%: Good
20-50%: Needs attention
>50%: Poor - consider increasing max_wal_size
This Enhancement Request will be available in DX O2 SAAS release 25.11.