Troubleshooting embedded database performance in VMware Cloud Director environments

Products

VMware Cloud Director

Issue/Introduction

Database utilization is consistently showing spikes.
High utilization causes production impact with the postgres disk usage at 100%.

Environment

VMware Cloud Director 10.5

VMware Cloud Director 10.6

Resolution

For assistance, contact technical support and note this Knowledge Article ID (393588) in the problem description. For more information, see How to Submit a Support Request.

Share the following information when opening the case:

When the spike happens, increase the logging levels to TRACE from DEBUG and get the logs collected.

To increase logging levels, refer documentation.

Change is required for the following parameters:

From
log4j.logger.com.vmware.vcloud.inventory.impl=DEBUG, vclistener
to
log4j.logger.com.vmware.vcloud.inventory.impl=TRACE, vclistener

AND

From
log4j.logger.com.vmware.vcloud.vimproxy.internal.impl=DEBUG, vclistener
to
log4j.logger.com.vmware.vcloud.vimproxy.internal.impl=TRACE, vclistener

NOTE :

Cell restart is NOT required for these changes to take effect.

No downtime involved in these changes

Ensure there is enough diskspace in case the logs fill up the partition.

Once the logs are collected, revert the TRACE settings to DEBUG.

Check the top 10 large tables

select schemaname as table_schema,
relname as table_name,
pg_size_pretty(pg_total_relation_size(relid)) as total_size,
pg_size_pretty(pg_relation_size(relid)) as data_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid))as external_size
from pg_catalog.pg_statio_user_tables
order by pg_total_relation_size(relid) desc, pg_relation_size(relid) desc
limit 10;
Check the data related to toast:

SELECT *, pg_size_pretty(total_bytes) AS total,
pg_size_pretty(index_bytes) AS INDEX,
pg_size_pretty(toast_bytes) AS toast,
pg_size_pretty(table_bytes) AS TABLE
FROM (
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r'
) a
) a ORDER by total_bytes desc
LIMIT 20;
Check the data related to dead tuples.

SELECT psut.relname, to_char(psut.last_vacuum, 'YYYY-MM-DD HH24:MI') as last_vacuum, to_char(psut.last_autovacuum, 'YYYY-MM-DD HH24:MI') as last_autovacuum, to_char(pg_class.reltuples, '9G999G999G999') AS n_tup, to_char(psut.n_dead_tup, '9G999G999G999') AS dead_tup, to_char(CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric) * pg_class.reltuples), '9G999G999G999') AS av_threshold, CASE WHEN CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric) * pg_class.reltuples) < psut.n_dead_tup THEN '*' ELSE '' END AS expect_av FROM pg_stat_user_tables psut JOIN pg_class on psut.relid = pg_class.oid ORDER BY 5 desc;
Monitor if auto vacuum is cleaning up data as expected:

SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;