vCSA Backup failed with "ERROR: Timeout! Failed to complete in 72000 seconds"
search cancel

vCSA Backup failed with "ERROR: Timeout! Failed to complete in 72000 seconds"

book

Article ID: 317884

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Examining the backup.log file at /var/log/vmware/applmgmt
We can see the following highlighted entries
2019-09-02T01:09:49.553 [VCDBBackup:PID-36378] INFO: Executing command: /opt/vmware/vpostgres/current/bin/psql -U postgres -p 5432 -t -c "select pg_is_in_backup();".
2019-09-02T01:09:49.579 [VCDBBackup:PID-36378] INFO: Postgres on-line backup already finished. Skip stopping backup operation
2019-09-02T01:09:49.579 [VCDBBackup:PID-36378] INFO: Successfully cleaned up for VCDB backup.
2019-09-02T21:01:08.368 [MainProcess:PID-36035] ERROR: Timeout! Failed to complete in 72000 seconds
2019-09-02T21:01:08.369 [MainProcess:PID-36035] ERROR: BackupManager encountered an exception: Timeout! Failed to complete in 72000 seconds.
2019-09-02T21:01:08.371 [MainProcess:PID-36035] INFO: Cleaning up the backup job.
2019-09-02T21:01:08.371 [MainProcess:PID-36035] INFO: Cleaning up all running backup child processes.
2019-09-02T21:01:08.432 [MainProcess:PID-36035] INFO: Cleaning up all sizeFiles.
2019-09-02T21:01:08.433 [MainProcess:PID-36035] INFO: Cleaning up Lotus.
2019-09-02T21:01:08.433 [MainProcess:PID-36035] INFO: Successfully completed Lotus cleanup.
2019-09-02T21:01:08.433 [MainProcess:PID-36035] INFO: Cleaning up ComponentScripts.
2019-09-02T21:01:08.433 [MainProcess:PID-36035] INFO: Cleaning up ConfigFiles.
2019-09-02T21:01:08.433 [MainProcess:PID-36035] INFO: Cleaning up StatsMonitorDB.
2019-09-02T21:01:08.433 [MainProcess:PID-36035] INFO: Cleaning up VCDB.
2019-09-02T21:01:08.434 [MainProcess:PID-36035] INFO: Finishing full database backup
2019-09-02T21:01:08.434 [MainProcess:PID-36035] INFO: Retrieving postgres server listening port
2019-09-02T21:01:08.434 [MainProcess:PID-36035] INFO: Executing command: netstat -plnt.
2019-09-02T21:01:08.508 [MainProcess:PID-36035] INFO: Canceling running pg_start_backup() process.
2019-09-02T21:01:08.509 [MainProcess:PID-36035] INFO: Executing command: ['/opt/vmware/vpostgres/current/bin/psql', '-U', 'postgres', '-p', '5432', '-At', '-c', "SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid()  and query ~ '^select\\ pg_xlogfile_name\\(pg_start_backup\\(.*\\)\\)\\;' "].
2019-09-02T21:01:08.540 [MainProcess:PID-36035] INFO: No pg_start_backup() process is running.
2019-09-02T21:01:08.541 [MainProcess:PID-36035] INFO: Checking whether Postgres online backup still in progress
2019-09-02T21:01:08.541 [MainProcess:PID-36035] INFO: Executing command: /opt/vmware/vpostgres/current/bin/psql -U postgres -p 5432 -t -c "select pg_is_in_backup();".
2019-09-02T21:01:08.570 [MainProcess:PID-36035] INFO: Postgres on-line backup already finished. Skip stopping backup operation
2019-09-02T21:01:08.571 [MainProcess:PID-36035] INFO: Successfully cleaned up for VCDB backup.
2019-09-02T21:01:08.571 [MainProcess:PID-36035] INFO: Cleaning up failed backup files.
2019-09-02T21:01:08.728 [MainProcess:PID-36035] INFO: Cleaned up the backup job.
2019-09-02T21:01:09.537 [MainProcess:PID-36035] INFO: Event com.vmware.applmgmt.backup.job.failed.event successfully posted to http://localhost:8085/sdk
2019-09-02T21:01:09.548 [MainProcess:PID-36035] INFO: Backup job failed.
  • Examining carefully the log, isolating the processes related to ComponentScriptsBackup
# grep -i "ComponentScriptsBackup" backup.log
 
a) vum component backup starts and finishes successfully
2019-09-02T01:01:08.334 [ComponentScriptsBackup:PID-36375] INFO: Starting backup component: vum 
2019-09-02T01:01:08.343 [ComponentScriptsBackup:PID-36375] INFO: Execute vum script: /etc/vmware/backup/component-scripts/vum/backup_restore.py --startBackup
2019-09-02T01:01:08.370 [ComponentScriptsBackup:PID-36375] INFO: Dispatching stream.
2019-09-02T01:01:09.316 [ComponentScriptsBackup:PID-36375] INFO: Component vum backup successful.
 
b) imagebuilder component starts and finishes successfully
2019-09-02T01:01:09.317 [ComponentScriptsBackup:PID-36375] INFO: Starting backup component: imagebuilder
2019-09-02T01:01:09.317 [ComponentScriptsBackup:PID-36375] INFO: Execute imagebuilder script: /etc/vmware/backup/component-scripts/imagebuilder/backup_restore.py --startBackup
2019-09-02T01:01:09.334 [ComponentScriptsBackup:PID-36375] INFO: Dispatching stream.
2019-09-02T01:01:10.633 [ComponentScriptsBackup:PID-36375] INFO: Component imagebuilder backup successful.
 
c) imagebuilder component starts but there's not a line with INFO: Component rdb backup successful
2019-09-02T01:01:10.637 [ComponentScriptsBackup:PID-36375] INFO: Starting backup component: rbd
2019-09-02T01:01:10.637 [ComponentScriptsBackup:PID-36375] INFO: Execute rbd script: /etc/vmware/backup/component-scripts/rbd/rbd-backup-restore --startBackup
2019-09-02T01:01:10.643 [ComponentScriptsBackup:PID-36375] INFO: Dispatching stream.
--> Missing a line with INFO: Component rdb backup successful
 
 


Environment

VMware vSphere 6.7.x

Cause

This indicates that the rdb database backup has not completed. The reason is because that database is constantly being updated. Frequent changes or other thread/processes that modify it can cause a running backup to restart. If the restarts happen frequently enough, it may never complete and the backup of the VCSA times out.

Resolution

It will be fixed in vSphere 6.7 P01 release

Workaround:
There are two possible workarounds

a) If autodeploy service is used, stopping autodeploy service before performing the backup

Note: Stopping autodeploy during the backup aims to prevent the backup from restarting. Note that vCenter may not PXE-boot hosts when autodeploy is stopped

# service-control --stop vmware-rbd-watchdog
Once that the backup finishes, the service can be restarted
# service-control --start vmware-rbd-watchdog

To disable autodeploy and make it persistent across reboots, by turning off the automatic startup of the service on reboot
# /usr/lib/vmware-vmon/vmon-cli --update rbd --starttype MANUAL

b) Skipping rdb backup
Note that with this second workaround autodeploy related information will not be present in the backup
# mv /etc/vmware/backup/manifests/rbd.json /etc/vmware/backup/manifests/rbd.json_bak