Scheduled VAMI backups fail intermittently with Error: "BackupManager encountered an exception" due to network connectivity issues with backup server
search cancel

Scheduled VAMI backups fail intermittently with Error: "BackupManager encountered an exception" due to network connectivity issues with backup server

book

Article ID: 316517

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • VAMI based vCenter server scheduled backups fail intermittently with error similar to below on the VAMI Console: 
    "Backup Manager encountered an Exception"

  • Manual backup works as expected and this behavior can observed across any backup protocols SFTP, SCP or SMB

  • On the  /var/log/vmware/applmgmt/backup.log of the vcenter, entries similar to below are observed:
YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:345] ERROR: Process returncode is -13, but expected exit codes are [0]. 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:327] ERROR: rc: 1, stderr: Traceback (most recent call last):
 File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/plugins/../util/Calculate.py", line 59, in <module>
    main(sys.argv[1], sys.argv[2], sys.argv[3])
 File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/plugins/../util/Calculate.py", line 46, in main
    stdout_obj.write(data)
BrokenPipeError: [Errno 32] Broken pipe  
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:332] INFO: Skip to report the error. 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [StatsMonitorDBBackup:PID-23629] [Proc::GetProcsStatus:Proc.py:345] ERROR: Process returncode is 1, but expected exit codes are [0]. 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [StatsMonitorDBBackup:PID-23629] [Proc::UpdateExceptionStatus:Proc.py:383] ERROR: Checksum not generated at /dev/shm/backupRestoreSumFile-XXXXXXXXXXXXXXXX-m56phujs 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [StatsMonitorDBBackup:PID-23629] [StatsMonitorDB::BackupStatsMonitorDB:StatsMonitorDB.py:125] ERROR: Failed to dispatch dump image of Appliance Stats Monitor database.
Underlying process status. rc: -13 stdout: 
stderr: 
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/StatsMonitorDB.py", line 111, in BackupStatsMonitorDB
    db_path, dump_file)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/StatsMonitorDB.py", line 55, in _dump_sqlite_db
    stdout=PIPE, stdout_fn=dispatch_data)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 433, in RunCmd
    result = stdout_fn(process.stdout)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/StatsMonitorDB.py", line 50, in dispatch_data
    status)
util.Common.BackupRestoreError: Failed to dispatch dump image of Appliance Stats Monitor database.
Underlying process status. rc: -13
... 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [VCDB-WAL-Backup:PID-23655] [VCDB::_backup_wal_files:VCDB.py:798] INFO: VCDB backup WAL start not received yet. 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [MainProcess:PID-23351] [Proc::VerifyProcStatusAndGetArchive:Proc.py:158] ERROR: Error at process StatsMonitorDBBackup; rc:-13. 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [MainProcess:PID-23351] [Proc::VerifyProcStatusAndGetArchive:Proc.py:162] ERROR: stderr:Failed to dispatch dump image of Appliance Stats Monitor database.  
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [MainProcess:PID-23351] [Proc::VerifyProcStatusAndGetArchive:Proc.py:172] INFO: Following error message isn't localized:
  stderr:Failed to dispatch dump image of Appliance Stats Monitor database.  
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [MainProcess:PID-23351] [BackupManager::main:BackupManager.py:592] ERROR: BackupManager encountered an exception: Hit exception inside process StatsMonitorDBBackup: Checksum not generated at /dev/shm/backupRestoreSumFile-XXXXXXXXXXXXXXXX-m56phujs 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [MainProcess:PID-23351] [BackupManager::Cleanup:BackupManager.py:406] ERROR: Failed to clean up backup child processes.
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 583, in main
    backupObj.DoBackup()
  ...
Exception: Hit exception inside process StatsMonitorDBBackup: Checksum not generated at /dev/shm/backupRestoreSumFile-XXXXXXXXXXXXXXXX-m56phujs 
...
During handling of the above exception, another exception occurred:
...
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 251, in CleanupChildProcesses
    proc.wait(timeout=30)
  ...
psutil._exceptions.TimeoutExpired: psutil.TimeoutExpired timeout after 30 seconds (pid=23627)
...
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [VCDBBackup:PID-23627] [VCDB::BackupVCDB:VCDB.py:2057] ERROR: Encounter error during backup VCDB.
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 1993, in BackupVCDB
    br_state.isFastBackupRequired())
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 571, in _start_pg_backup
    "backupfast" : 'true' if backup_fast else 'false'})
psycopg2.OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request. 
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [VCDBBackup:PID-23627] [Proc::UpdateExceptionStatus:Proc.py:383] ERROR: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.  
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXXXX] [VCDBBackup:PID-23627] [VCDB::BackupVCDB:VCDB.py:2070] INFO: Terminate sub process 23655
...
During handling of the above exception, another exception occurred:
...
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXX] [VCDB-WAL-Backup:PID-XXXXXXX] [VCDB::run:VCDB.py:1111] ERROR: Failed to backup WAL files.
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXX] [VCDB-WAL-Backup:PID-XXXXXXX] [VCDB::run:VCDB.py:1112] ERROR: Failed to dispatch WAL meta.
Underlying process status. rc: 9
stdout:
stderr: b'curl: (9) Upload failed: Permission denied (3/-31)\n'
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXXXXXXX] [VCDBBackup:PID-1564169] [VCDB::BackupVCDB:VCDB.py:2057] ERROR: Encounter error during backup VCDB.
Traceback (most recent call last):
 File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 2030, in    BackupVCDB
  wal_backup_status = status_queue.get()
File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
  return _ForkingPickler.loads(res)
TypeError: BackupRestoreError.__init__() missing 1 required positional argument: 'status'
[YYYY-MM-DDTHH:MM:SS] [XXXXXXXXX] [VCDBBackup:PID-XXXX] [Proc::UpdateExceptionStatus:Proc.py:384] ERROR: BackupRestoreError.__init__() missing 1 required positional argument: 'status'


Environment

VMware vCenter Server 7.x
VMware vCenter Server 8.x
VMware vCenter Server 9.x

Cause

The backup failure error reported here is: "BrokenPipeError: [Errno 32] Broken pipe". This primarily indicates that the vcenter is failing to reach the backup server mostly due to intermittent network congestion/bandwidth issue during the scheduled backup window. concurrent requests and network contention as can also affect this especially when multiple appliances or servers target the same backup server as destination

Resolution

  1. Make sure that the backup server remains powered on and reachable throughout the entire backup window

  2. Engage your Network team to review the network connectivity between the backup server and the vcenter during the scheduled backup time. One of the options can be to perform a packet capture by executing the below command on the vcenter:
    tcpdump -i eth0 host <BACKUP_SERVER_IP> -s 0 -vvv -w /tmp/VC_BACKUP.pcap

  3. To avoid conflicts with other scheduled operations, reschedule the backup to a different time slot especially off production hours 

Additional Information

VCSA after upgrade to version 8 is not able to complete backup

vCenter backups to SMB file server consistently fail after reaching approximately 95% completion