SDDC manager backup timeouts with error "Package SDDC Manager backup operation failed" due to increased size of the PostgreSQL database.
search cancel

SDDC manager backup timeouts with error "Package SDDC Manager backup operation failed" due to increased size of the PostgreSQL database.

book

Article ID: 385130

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

  • Backup will fail on sub-task "Package and Encrypt SDDC Manager Backup" with error "Package SDDC Manager backup operation failed"
  • SDDC manager contains errors similar to the excerpt below:

    /var/log/vmware/vcf/sddc-support/vcf-sos.log

    ERROR [vcf_sos] [commandutils.py::execute_cmd_locally::244::backup-####] Std Error: ; Command cd
    /var/log/vmware/vcf/sddc-support/backup-yyyy-mm-dd-hh-mm-ss-2265; tar -zcvpf vcf-backup-vcf-sddc-example.com-yyyy-mm-dd-hh-mm-ss.tar.gz.tmp
    vcf-backup-vcf-sddc-example.com-yyyy-mm-dd-hh-mm-ss/* did not complete even after waiting for 120 seconds
    ERROR [vcf_sos] [backuphelper.py::package_sddc_mgr_backup::249::backup-####]
    Command execution failed : cd /var/log/vmware/vcf/sddc-support/backup-yyyy-mm-dd-hh-mm-ss-2265; tar -zcvpf vcf-backup-vcf-sddc-example.com-yyyy-mm-dd-hh-mm-ss.tar.gz.tmp vcf-backup-vcf-sddc-example.com-yyyy-mm-dd-hh-mm-ss/*
    ERROR [vcf_sos] [backuphelper.py::package_sddc_mgr_backup::269::backup-####] Package SDDC Manager backup operation failed
    ERROR [vcf_sos] [backuphelper.py::package_sddc_mgr_backup::271::backup-####] Traceback (most recent call last):
    ERROR [vcf_sos] [backupservice.py::execute::78::backup-####] Package SDDC Manager backup operation failed
    ERROR [vcf_sos] [backupservice.py::execute::79::backup-#####] SDDC Manager backup operation failed in task PackageSDDCManagerBackup.

  • KB SDDC manager backup fails with error "Package SDDC Manager backup operation failed" does not apply, as there are no large files found in either the /opt/vmware or /etc/vmware directories or their subdirectories. Instead, the issue seems to be related to the PostgreSQL dump file sddc-postgres.bkp created by the backup process, which is unusually large.

  • When running the following query in the PostgreSQL database, the vault_secret table in the operationsmanager database is consuming an unusually high amount of data.

    psql -h localhost -U postgres -d operationsmanager -c "select table_schema, table_name, pg_size_pretty(pg_total_relation_size(quote_ident(table_name))) size from information_schema.tables where table_schema='public' order by pg_total_relation_size(quote_ident(table_name)) desc;"

    Example

    4.3GB Table size 

     table_schema | table_name                    | size 
    --------------+-------------------------------+---------
    public        | vault_secret                  | 4307 MB
    public        | processing_task               | 36 MB
    public        | processing_context            | 9200 kB
    public        | execution                     | 648 kB
    public        | task                          | 208 kB
    public        | execution_to_task             | 104 kB
    public        | databasechangelog             | 64 kB
    public        | config_drift_reconciliations  | 32 kB
    public        | databasechangeloglock         | 24 kB
    public        | vault_cipher_version          | 24 kB
    (10 rows)

  • Similarly, when running the following query, a very large number of assessment runs.

    psql -h localhost -U postgres -d operationsmanager -c "select left(subquery_alias.description, 60) as task, pg_size_pretty(cast(subquery_alias.sum as BIGINT)) as size, subquery_alias.cnt from (select t.description, sum(length(vs.secret_text)) as sum, count(*) as cnt from vault_secret vs inner join execution et on vs.id = et.id inner join execution_to_task ett on et.id = ett.execution_id inner join task t on ett.task_id = t.id group by t.description) as subquery_alias order by size desc;"

    Example

    2857 assessments count 
                                 task                            |    size       | cnt  
    -------------------------------------------------------------+---------------+------
     Assess SDDC Domain(s)                                       | 4307 MB       | 2857
     Commissioning host(s) hostname.example.com                   | 58 kB         | 1
     Decommissioning host(s) hostname.example.com                 | 5160 bytes    | 1
    (3 rows)

Environment

VMware SDDC Manager 5.x
VMware SDDC Manager 4.x

Cause

The backup process, which expects a smaller database dump, will time out after 120 seconds.

This behaviour of the growing PostgreSQL database is a known issue in SDDC Manager, as there is no mechanism in place to clean up old precheck run records from the vault_secret table.

Resolution

Engineering is aware of this issue and is working to resolve it in a future release.

Contact VMware by Broadcom Support for further assistance.