Troubleshooting embedded database performance in VMware Cloud Director environments
search cancel

Troubleshooting embedded database performance in VMware Cloud Director environments

book

Article ID: 393588

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • Database utilization is consistently showing spikes.
  • High utilization causes production impact with the postgres disk usage at 100%.

Environment

VMware Cloud Director 10.5

VMware Cloud Director 10.6

Resolution

For assistance, contact technical support and note this Knowledge Article ID (393588) in the problem description. For more information, see How to Submit a Support Request.

Share the following information when opening the case:

  1. When the spike happens, increase the logging levels to TRACE from DEBUG and get the logs collected.
     
    1. To increase logging levels, refer  documentation.
    2. Change is required for the following parameters: 

      From
      log4j.logger.com.vmware.vcloud.inventory.impl=DEBUG, vclistener
      to
      log4j.logger.com.vmware.vcloud.inventory.impl=TRACE, vclistener

      AND

      From
      log4j.logger.com.vmware.vcloud.vimproxy.internal.impl=DEBUG, vclistener
      to
      log4j.logger.com.vmware.vcloud.vimproxy.internal.impl=TRACE, vclistener

      NOTE : 
      Cell restart is NOT required for these changes to take effect.
      No downtime involved in these changes
      Ensure there is enough diskspace in case the logs fill up the partition. 
      Once the logs are collected, revert the TRACE settings to DEBUG.
  2. Check the top 10 large tables

    select schemaname as table_schema,
    relname as table_name,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size,
    pg_size_pretty(pg_relation_size(relid)) as data_size,
    pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid))as external_size
    from pg_catalog.pg_statio_user_tables
    order by pg_total_relation_size(relid) desc, pg_relation_size(relid) desc
    limit 10;

  3. Check the data related to toast:

    SELECT *, pg_size_pretty(total_bytes) AS total,
    pg_size_pretty(index_bytes) AS INDEX,
    pg_size_pretty(toast_bytes) AS toast,
    pg_size_pretty(table_bytes) AS TABLE
      FROM (
      SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
          SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
                  , c.reltuples AS row_estimate
                  , pg_total_relation_size(c.oid) AS total_bytes
                  , pg_indexes_size(c.oid) AS index_bytes
                  , pg_total_relation_size(reltoastrelid) AS toast_bytes
              FROM pg_class c
              LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
              WHERE relkind = 'r'
      ) a
    ) a ORDER by total_bytes desc
    LIMIT 20;

  4. Check the data related to dead tuples.

    SELECT psut.relname,
         to_char(psut.last_vacuum, 'YYYY-MM-DD HH24:MI') as last_vacuum,
         to_char(psut.last_autovacuum, 'YYYY-MM-DD HH24:MI') as last_autovacuum,
         to_char(pg_class.reltuples, '9G999G999G999') AS n_tup,
         to_char(psut.n_dead_tup, '9G999G999G999') AS dead_tup,
         to_char(CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
             + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric)
                * pg_class.reltuples), '9G999G999G999') AS av_threshold,
         CASE
             WHEN CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
                 + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric)
                    * pg_class.reltuples) < psut.n_dead_tup
             THEN '*'
             ELSE ''
         END AS expect_av
    FROM pg_stat_user_tables psut
         JOIN pg_class on psut.relid = pg_class.oid
    ORDER BY 5 desc;


  5. Monitor if auto vacuum is cleaning up data as expected:

    SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;