Enhancement Request to better monitoring of the Checkpoints
search cancel

Enhancement Request to better monitoring of the Checkpoints

book

Article ID: 412407

calendar_today

Updated On:

Products

DX SaaS

Issue/Introduction

You are looking for a better monitoring of the Checkpoints with APMIA postgres extension.

Currently there are three metrics related to Checkpoints monitoring.

Checkpoints:Checkpoints Requested
Checkpoints:Checkpoints Scheduled
Checkpoints:Time Spent Where Files Are Written To Disk (ms)

Full matric path is:

SuperDomain|<hostname>|Infrastructure|Agent|Postgres Databases|<database server IP/hostname>>|postgres|....

It seems like metric Checkpoints:Time Spent Where Files Are Written To Disk (ms) are reporting cumulative value and not much use for proper monitoring of Checkpoint if there is an issue.

Your requirement is to have some type of value where you can set a threshold and get an alert if there is an issue with a Checkpoint during certain time window.

Resolution

The APMIA Infrastructure Agent Engineering/DEV team has reviewed this Enhancement Request.

They are planning to have the following funcationality added to the APMIA Infrastructure Agent PostgreSQL Extension.

Once Enhancement Request is complete, this funcationality will be available with out of the box agent Extension.

Implementing the following:

Checkpoint Write Ratio

checkpoint_write_ratio = (checkpoints_req / (checkpoints_timed + checkpoints_req)) * 100

checkpoints_req: Checkpoints forced by reaching max_wal_size or checkpoint_segments (emergency/forced checkpoints)
checkpoints_timed: Checkpoints that happened on schedule due to checkpoint_timeout


All checkpoints are happening on schedule (checkpoints_timed)
No emergency/forced checkpoints (checkpoints_req = 0)
Your WAL (Write-Ahead Log) configuration is properly sized
The system isn't being overwhelmed with writes

Values to be considered for alerting ate as follows (this is on suggestion and could be modified according to the environment and monitoring needs): 

0-10%: Excellent (your case)
10-20%: Good
20-50%: Needs attention
>50%: Poor - consider increasing max_wal_size

This Enhancement Request will be available in DX O2 SAAS release 25.11.