vCenter Server VAMI backup fails intermittently with Error: "BackupManager encountered an exception"
search cancel

vCenter Server VAMI backup fails intermittently with Error: "BackupManager encountered an exception"

book

Article ID: 316517

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • VAMI based vCenter server backup fails intermittently.
  • Error on the VAMI Console:  "BackUp Manager encountered an Exception"
  • Subsequent attempts for the Back-Up jobs complete successfully with no errors. 


Environment

VMware vCenter Server 7.0.3
VMware vCenter Server 7.0.0

Cause

The backup failure error reported here is: "BrokenPipeError: [Errno 32] Broken pipe"
This primarily indicates an intermittent network issue. Although the manual upload has passed, such errors are hard to trace and can happen anytime there is a minor glitch between the VC and the backup.


Log File Location  :  /var/log/vmware/applmgmt/backup.log
Backup failure log snippets in this scenario: 

2024-01-25T18:30:03.214 [20240125-182910-22357613] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:345] ERROR: Process returncode is -13, but expected exit codes are [0].
2024-01-25T18:30:03.214 [20240125-182910-22357613] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:327] ERROR: rc: 1, stderr: Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/plugins/../util/Calculate.py", line 59, in <module>
    main(sys.argv[1], sys.argv[2], sys.argv[3])
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/plugins/../util/Calculate.py", line 46, in main
    stdout_obj.write(data)
BrokenPipeError: [Errno 32] Broken pipe

2024-01-25T18:30:03.215 [20240125-182910-22357613] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:332] INFO: Skip to report the error.
2024-01-25T18:30:03.215 [20240125-182910-22357613] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:345] ERROR: Process returncode is 1, but expected exit codes are [0].
2024-01-25T18:30:03.215 [20240125-182910-22357613] [StatsMonitorDBBackup:PID-23629] [Proc::UpdateExceptionStatus:Proc.py:383] ERROR: Checksum not generated at /dev/shm/backupRestoreSumFile-20240125-182910-22357613-m56phujs 
2024-01-25T18:30:03.218 [20240125-182910-22357613] [StatsMonitorDBBackup:PID-23629] [StatsMonitorDB::BackupStatsMonitorDB:StatsMonitorDB.py:125] ERROR: Failed to dispatch dump image of Appliance Stats Monitor database.
Underlying process status. rc: -13
stdout: 
stderr: 
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/StatsMonitorDB.py", line 111, in BackupStatsMonitorDB
    db_path, dump_file)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/StatsMonitorDB.py", line 55, in _dump_sqlite_db
    stdout=PIPE, stdout_fn=dispatch_data)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 433, in RunCmd
    result = stdout_fn(process.stdout)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/StatsMonitorDB.py", line 50, in dispatch_data
    status)
util.Common.BackupRestoreError: Failed to dispatch dump image of Appliance Stats Monitor database.
Underlying process status. rc: -13
stdout: 
stderr: 
2024-01-25T18:30:03.641 [20240125-182910-22357613] [VCDB-WAL-Backup:PID-23655] [VCDB::_backup_wal_files:VCDB.py:798] INFO: VCDB backup WAL start not received yet.
2024-01-25T18:30:03.669 [20240125-182910-22357613] [MainProcess:PID-23351] [Proc::VerifyProcStatusAndGetArchive:Proc.py:158] ERROR: Error at process StatsMonitorDBBackup; rc:-13.
2024-01-25T18:30:03.669 [20240125-182910-22357613] [MainProcess:PID-23351] [Proc::VerifyProcStatusAndGetArchive:Proc.py:162] ERROR: stderr:Failed to dispatch dump image of Appliance Stats Monitor database.

2024-01-25T18:30:03.669 [20240125-182910-22357613] [MainProcess:PID-23351] [Proc::VerifyProcStatusAndGetArchive:Proc.py:172] INFO: Following error message isn't localized:
  stderr:Failed to dispatch dump image of Appliance Stats Monitor database.

2024-01-25T18:30:03.669 [20240125-182910-22357613] [MainProcess:PID-23351] [BackupManager::main:BackupManager.py:592] ERROR: BackupManager encountered an exception: Hit exception inside process StatsMonitorDBBackup: Checksum not generated at /dev/shm/backupRestoreSumFile-20240125-182910-22357613-m56phujs 



 
2024-01-25T18:30:35.513 [20240125-182910-22357613] [MainProcess:PID-23351] [BackupManager::Cleanup:BackupManager.py:406] ERROR: Failed to clean up backup child processes.
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 583, in main
    backupObj.DoBackup()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 335, in DoBackup
    self.LaunchBackupProcesses()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 302, in LaunchBackupProcesses
    self.ExecBackupsInParallel()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 272, in ExecBackupsInParallel
    taskId=self.args.id, operation='BACKUP')
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 202, in LaunchMultipleProcesses
    timeout, logger)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 178, in VerifyProcStatusAndGetArchive
    (procRecord.process.name, procRecord.status.excMsg))
Exception: Hit exception inside process StatsMonitorDBBackup: Checksum not generated at /dev/shm/backupRestoreSumFile-20240125-182910-22357613-m56phujs 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 251, in CleanupChildProcesses
    proc.wait(timeout=30)
  File "/usr/lib/python3.7/site-packages/psutil/__init__.py", line 1262, in wait
    return self._proc.wait(timeout)
  File "/usr/lib/python3.7/site-packages/psutil/_pslinux.py", line 1459, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/psutil/_pslinux.py", line 1637, in wait
    return _psposix.wait_pid(self.pid, timeout, self._name)
  File "/usr/lib/python3.7/site-packages/psutil/_psposix.py", line 104, in wait_pid
    delay = check_timeout(delay)
  File "/usr/lib/python3.7/site-packages/psutil/_psposix.py", line 66, in check_timeout
    raise TimeoutExpired(timeout, pid=pid, name=proc_name)
psutil._exceptions.TimeoutExpired: psutil.TimeoutExpired timeout after 30 seconds (pid=23627)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 403, in Cleanup
    CleanupChildProcesses()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 259, in CleanupChildProcesses
    raise Exception("%s" % str(e))
Exception: psutil.TimeoutExpired timeout after 30 seconds (pid=23627)
 
2024-01-25T18:30:35.548 [20240125-182910-22357613] [VCDBBackup:PID-23627] [VCDB::BackupVCDB:VCDB.py:2057] ERROR: Encounter error during backup VCDB.
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 1993, in BackupVCDB
    br_state.isFastBackupRequired())
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 571, in _start_pg_backup
    "backupfast" : 'true' if backup_fast else 'false'})
psycopg2.OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

2024-01-25T18:30:35.553 [20240125-182910-22357613] [VCDBBackup:PID-23627] [Proc::UpdateExceptionStatus:Proc.py:383] ERROR: terminating connection due to administrator command
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

2024-01-25T18:30:35.554 [20240125-182910-22357613] [VCDBBackup:PID-23627] [VCDB::BackupVCDB:VCDB.py:2070] INFO: Terminate sub process 23655

Resolution


Due to conflicting schedule operations, move the backup schedule to another time slot.
(e.g. configure an existing scheduled backup at 2 PM instead of 12PM)