Symptoms:
[YYYY-MM-DDTHH:MM:SS][VCDB-WAL-Backup:PID-22783] [VCDB::_backup_wal_files:VCDB.py:798] INFO: VCDB backup WAL start not received yet.
[YYYY-MM-DDTHH:MM:SS][MainProcess:PID-22696] [BackupManager::Cleanup:BackupManager.py:406] ERROR: Failed to clean up backup child processes.
Traceback (most recent call last):
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 583, in main
backupObj.DoBackup()
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 335, in DoBackup
self.LaunchBackupProcesses()
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 302, in LaunchBackupProcesses
self.ExecBackupsInParallel()
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 272, in ExecBackupsInParallel
taskId=self.args.id, operation='BACKUP')
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 202, in LaunchMultipleProcesses
timeout, logger)
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 178, in VerifyProcStatusAndGetArchive
(procRecord.process.name, procRecord.status.excMsg))
Exception: Hit exception inside process StatsMonitorDBBackup: Plugin error occurred. ErrCode: 6, Args: ()
Traceback (most recent call last):
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 251, in CleanupChildProcesses
proc.wait(timeout=30)
File "/usr/lib/python3.7/site-packages/psutil/__init__.py", line 1262, in wait
return self._proc.wait(timeout)
File "/usr/lib/python3.7/site-packages/psutil/_pslinux.py", line 1459, in wrapper
return fun(self, *args, **kwargs)
File "/usr/lib/python3.7/site-packages/psutil/_pslinux.py", line 1637, in wait
return _psposix.wait_pid(self.pid, timeout, self._name)
File "/usr/lib/python3.7/site-packages/psutil/_psposix.py", line 104, in wait_pid
delay = check_timeout(delay)
File "/usr/lib/python3.7/site-packages/psutil/_psposix.py", line 66, in check_timeout
raise TimeoutExpired(timeout, pid=pid, name=proc_name)
psutil._exceptions.TimeoutExpired: psutil.TimeoutExpired timeout after 30 seconds (pid=22769)
Traceback (most recent call last):
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 403, in Cleanup
CleanupChildProcesses()
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 259, in CleanupChildProcesses
raise Exception("%s" % str(e))
Exception: psutil.TimeoutExpired timeout after 30 seconds (pid=22769)
:
:
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [SMBClientStorageIOLib::_run_cmd:SMBClientStorageIOLib.py:23] INFO: Running Command: ['/usr/bin/smbclient', '//172.25.xx.xx/example_folder/', '-A', '/dev/shm/smbCredentialFilew2j6wztn', '--send-buffer', '100M', '-c', 'deltree vCenter/sn_vcenter.example.com/M_7.0.3.00800_20230615-041442_']
[YYYY-MM-DDTHH:MM:SS]] [VCDBBackup:PID-22769] [VCDB::BackupVCDB:VCDB.py:2057] ERROR: Encounter error during backup VCDB.
Traceback (most recent call last):
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 1993, in BackupVCDB
br_state.isFastBackupRequired())
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 571, in _start_pg_backup
"backupfast" : 'true' if backup_fast else 'false'})
psycopg2.OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[YYYY-MM-DDTHH:MM:SS] [VCDBBackup:PID-22769] [Proc::UpdateExceptionStatus:Proc.py:383] ERROR: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
:
:
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [BackupManager::HandleBackupCleanup:BackupManager.py:474] ERROR: Failed to cleanup the backup job. Error: Failed to clean up the backup job.
[YYYY-MM-DDTHH:MM:SS] [VCDB-WAL-Backup:PID-22783] [VCDB::_backup_wal_files:VCDB.py:798] INFO: VCDB backup WAL start not received yet.
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [vpxdevent_lib::dispatch_events:vpxdevent_lib.py:275] INFO: Event com.vmware.applmgmt.backup.job.failed.event successfully posted to http://localhost:8085/sdk
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [BackupManager::main:BackupManager.py:615] INFO: Backup job failed.
VMware vCenter Server 7.0
The issue is observed when the backup fails and the child process (VCDB-WAL-Backup) is not killed.
This issue has been fixed in 7.0 U3i (p06).
Workaround
vi /usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py
# Terminate active sub process WALBackup before exit BackupVCDB
# only for full backup
if br_state.isFullVCDBBackup():
if wal_backup_proc.is_alive():
logger.info("Terminate sub process %s" %wal_backup_proc.pid)
wal_backup_proc.kill()
----
except Exception as e:
logger.exception('Encounter error during backup VCDB.')
UpdateExceptionStatus(e, status, logger)
# if PG_TELEMETRY is enabled, collect backup data and send to VAC
# --data1: backup type(Full or Partial)
# --data2: backup status(Success or Fail)
# --data3: vCenter version
finally:
# Do not return an error in the event of a failure, as telemetry
# data loss should not result in a critical failure.
try:
if not telemetry.check_telemetry_enabled(enableLogging=False):
logger.info('Telemetry data is disabled')
else:
vc_ver = telemetry.get_vc_version()
----
----
except Exception as e:
logger.exception('Encounter error during backup VCDB.')
status.errCode = 1
status.errMsg = 'Encounter error during backup VCDB'
UpdateExceptionStatus(e, status, logger)
# if PG_TELEMETRY is enabled, collect backup data and send to VAC
# --data1: backup type(Full or Partial)
# --data2: backup status(Success or Fail)
# --data3: vCenter version
finally:
# Terminate active sub process WALBackup before exit BackupVCDB
# only for full backup
if br_state.isFullVCDBBackup():
if wal_backup_proc.is_alive():
logger.info("Terminate sub process %s" %wal_backup_proc.pid)
wal_backup_proc.kill()
# Do not return an error in the event of a failure, as telemetry
# data loss should not result in a critical failure.
try:
if not telemetry.check_telemetry_enabled(enableLogging=False):
logger.info('Telemetry data is disabled')
else:
vc_ver = telemetry.get_vc_version()
----