As per the official documentation, the rule_id column refers to the rule that triggered this alert.
Documentation:
gpmetrics Schema Reference.
When you query the table, the rule_id references a number and doesn't give a direct explanation as to what error type it refers to.
gpperfmon=# select * from gpmetrics.gpcc_alert_history where rule_id <> 14 limit 5;
-[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 26
rule_id | 4
transaction_time | 2022-08-31 12:45:50+01
content | {"Qid":{"Tmid":1661850217,"Ssid":9763,"Ccnt":26},"Database":"gpadmin","User":"gpadmin","QueryText":"select largefunction();","SubmitTime":"2022-08-31T12:45:38+01:00","StartTime":"2022-08-31T12:45:38+01:00","ErrorMsg":"Out of memory"}
config | {}
This is sometimes noticeable in the "content" column. For the above example, rule_id 4 is for an "Out of memory" message.
Below are the codes for the rule_id column:
1. Query Run Time Alert
2. Query Block Time Alert
3. Query Spill Files Alert
4. Query OutOfMemory alert
5. Connection Number alert
6. Segment Down alert
7. Segment Disk Usage alert
8. Segment CPU Usage alert
9. Master CPU Usage alert
10. Segment Memory Usage alert
11. Master Memory Usage alert
12. Connection Failure alert
13. Master PgLog Panic alert
14. Master PgLog Fatal alert
So in the event of an issue, you can refer to the gpcc_alert_history and search based on the rule_id. Equally you can use a similar query as the one below to highlight the alert type based on the rule_id. If the rule_id falls out of the 1-14 range, it is displayed as unknown. (I have yet to see anything fall outside of this range)
gpperfmon=# \x
Expanded display is on.
gpperfmon=#
gpperfmon=# SELECT id,transaction_time, rule_id,
case when rule_id = 1 then 'Query Run Time Alert'
when rule_id = 2 then 'Query Block Time Alert'
when rule_id = 2 then 'Query Spill Files Alert'
when rule_id = 4 then 'Query OutOfMemory alert'
when rule_id = 5 then 'Connection Number alert'
when rule_id = 6 then 'Segment Down alert'
when rule_id = 7 then 'Segment Disk Usage alert'
when rule_id = 8 then 'Segment CPU Usage alert'
when rule_id = 9 then 'Master CPU Usage alert'
when rule_id = 10 then 'Segment Memory Usage alert'
when rule_id = 11 then 'Master Memory Usage alert'
when rule_id = 12 then 'Connection Failure alert'
when rule_id = 13 then 'Master log Panic alert'
when rule_id = 14 then 'Master log Fatal alert'
else 'Unknown'
end as alert_type,
content, config
FROM gpmetrics.gpcc_alert_history;
-[ RECORD 1 ]----+--------------------------------------------------------------------
id | 1
transaction_time | 2023-04-02 03:00:34+01
rule_id | 14
alert_type | Master log Fatal alert
content | [{"Time":"2023-04-02 03:00:12","User":"","Message":"terminating background worker \"dtx recovery process\" due to administrator command"},{"Time":"2023-04-02 03:00:12","User":"gpmon","Message":"terminating connection due to administrator command"},{"Time":"2023-04-02 03:00:12","User":"","Message":"terminating background worker \"ftsprobe process\" due to administrator command"},{"Time":"2023-04-02 03:00:12","User":"gpmon","Message":"terminating connection due to administrator command"},{"Time":"2023-04-02 03:00:12","User":"gpmon","Message":"terminating connection due to administrator command"},{"Time":"2023-04-02 03:00:12","User":"gpmon","Message":"terminating connection due to administrator command"}]
config | {}
OR
Create a rule_id_list table:
create table rule_id_list ( rule_id integer, rule_name text) distributed by (rule_id);
insert into rule_id_list values
(1, 'Query Run Time Alert'),
(2, 'Query Block Time Alert'),
(3, 'Query Spill Files Alert'),
(4, 'Query OutOfMemory alert'),
(5, 'Connection Number alert'),
(6, 'Segment Down alert'),
(7, 'Segment Disk Usage alert'),
(8, 'Segment CPU Usage alert'),
(9, 'Master CPU Usage alert'),
(10, 'Segment Memory Usage alert'),
(11, 'Master Memory Usage alert'),
(12, 'Connection Failure alert'),
(13, 'Master PgLog Panic alert'),
(14, 'Master PgLog Fatal alert');
Then run the query:
SELECT id, transaction_time, a.rule_id, l.rule_name, content, config FROM gpmetrics.gpcc_alert_history a
JOIN rule_id_list l ON (a.rule_id = l.rule_id);