In a cluster of 5 databases(DB1, DB2, DB3, DB4, DB5), the below command only outputs DB1, DB2 and DB3, which is not expected:
curl -H "Authorization: Bearer xxxxx" http:/<ip_address>:<perf_port>/table_metrics | awk -F'"' '{print $2}' | sort | uniq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8862k 0 8862k 0 0 26.0M 0 --:--:-- --:--:-- --:--:-- 26.0M
DB1
DB2
DB3
A typical cause is the limit controlled by a filtering parameter table_max_rows in gpcc.conf has been reached. Here below is an sample configuration in gpcc.conf:
[metrics_export_table]
filter_combine = or
table_min_size_bytes = 1073741824
table_min_bloat_ratio = 0.3
table_min_skew = 0.2
table_max_rows = 5000
exclude_databases = postgres, template0
exclude_schemas = pg_temp*
include_databases =
include_schemas =
include_tables =
exclude_tables =The internal mechanism is GPCC Background Scan will reach all the databases one by one(normally in alphabetic order) until the number of tables scanned reaches table_max_rows(in this example, it's 5000). So if the 5000th table falls right into DB3, the scan stops here and the rest tables will not be scanned. That's why we don't see DB4 and DB5 from the result of the curl command mentioned above.
To verify it, we can also use the below command, if it returns 5000, then it proves that's the cause:
curl -H "Authorization: Bearer xxxxx=" http://<ip_address>:<perf_port>/table_metrics | grep -c "gpdb_table_size_bytes{"
If we want it to scan more tables, we may need to increase the value of table_max_rows. Note that the performance might be impacted by the increase of the value.
This is just a typcial cause for this scenario. In fact, there are some other filtering factors like table_min_size_bytes, table_min_bloat_ratio and table_min_skew, etc. may also affect the results returned from the table_metrics.