VCSA VAMI backups fail with an Error: "VCDB backup WAL start not yet received"
search cancel

VCSA VAMI backups fail with an Error: "VCDB backup WAL start not yet received"

book

Article ID: 316587

calendar_today

Updated On:

Products

VMware vCenter Server VMware vCenter Server 7.0

Issue/Introduction

Symptoms:

  • vCenter file-based backup fails with an Error "VCDB backup WAL start not yet received"
  • /var/log/vmware/applmgmt/backup.log
[YYYY-MM-DDTHH:MM:SS][VCDB-WAL-Backup:PID-22783] [VCDB::_backup_wal_files:VCDB.py:798] INFO: VCDB backup WAL start not received yet.

[YYYY-MM-DDTHH:MM:SS][MainProcess:PID-22696] [BackupManager::Cleanup:BackupManager.py:406] ERROR: Failed to clean up backup child processes.
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 583, in main
    backupObj.DoBackup()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 335, in DoBackup
    self.LaunchBackupProcesses()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 302, in LaunchBackupProcesses
    self.ExecBackupsInParallel()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 272, in ExecBackupsInParallel
    taskId=self.args.id, operation='BACKUP')
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 202, in LaunchMultipleProcesses
    timeout, logger)
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 178, in VerifyProcStatusAndGetArchive
    (procRecord.process.name, procRecord.status.excMsg))
Exception: Hit exception inside process StatsMonitorDBBackup: Plugin error occurred. ErrCode: 6, Args: ()
  • During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 251, in CleanupChildProcesses
    proc.wait(timeout=30)
  File "/usr/lib/python3.7/site-packages/psutil/__init__.py", line 1262, in wait
    return self._proc.wait(timeout)
  File "/usr/lib/python3.7/site-packages/psutil/_pslinux.py", line 1459, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/psutil/_pslinux.py", line 1637, in wait
    return _psposix.wait_pid(self.pid, timeout, self._name)
  File "/usr/lib/python3.7/site-packages/psutil/_psposix.py", line 104, in wait_pid
    delay = check_timeout(delay)
  File "/usr/lib/python3.7/site-packages/psutil/_psposix.py", line 66, in check_timeout
    raise TimeoutExpired(timeout, pid=pid, name=proc_name)
psutil._exceptions.TimeoutExpired: psutil.TimeoutExpired timeout after 30 seconds (pid=22769)

 

  • During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/BackupManager.py", line 403, in Cleanup
    CleanupChildProcesses()
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/util/Proc.py", line 259, in CleanupChildProcesses
    raise Exception("%s" % str(e))
Exception: psutil.TimeoutExpired timeout after 30 seconds (pid=22769)
:
:
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [SMBClientStorageIOLib::_run_cmd:SMBClientStorageIOLib.py:23] INFO: Running Command: ['/usr/bin/smbclient', '//172.25.xx.xx/example_folder/', '-A', '/dev/shm/smbCredentialFilew2j6wztn', '--send-buffer', '100M', '-c', 'deltree vCenter/sn_vcenter.example.com/M_7.0.3.00800_20230615-041442_']
[YYYY-MM-DDTHH:MM:SS]] [VCDBBackup:PID-22769] [VCDB::BackupVCDB:VCDB.py:2057] ERROR: Encounter error during backup VCDB.
Traceback (most recent call last):
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 1993, in BackupVCDB
    br_state.isFastBackupRequired())
  File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py", line 571, in _start_pg_backup
    "backupfast" : 'true' if backup_fast else 'false'})
psycopg2.OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[YYYY-MM-DDTHH:MM:SS] [VCDBBackup:PID-22769] [Proc::UpdateExceptionStatus:Proc.py:383] ERROR: terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
:
:
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [BackupManager::HandleBackupCleanup:BackupManager.py:474] ERROR: Failed to cleanup the backup job. Error: Failed to clean up the backup job.
[YYYY-MM-DDTHH:MM:SS] [VCDB-WAL-Backup:PID-22783] [VCDB::_backup_wal_files:VCDB.py:798] INFO: VCDB backup WAL start not received yet.
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [vpxdevent_lib::dispatch_events:vpxdevent_lib.py:275] INFO: Event com.vmware.applmgmt.backup.job.failed.event successfully posted to http://localhost:8085/sdk
[YYYY-MM-DDTHH:MM:SS] [MainProcess:PID-22696] [BackupManager::main:BackupManager.py:615] INFO: Backup job failed.



Environment

VMware vCenter Server 7.0

Cause

The issue is observed when the backup fails and the child process (VCDB-WAL-Backup) is not killed.

Resolution

This issue has been fixed in 7.0 U3i (p06).

Workaround

  1. Connect to vCenter using SSH.
  2. Open the following script file with the vi command.

    vi /usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/components/VCDB.py

  3. Add the following content to the script and save it.
        # Terminate active sub process WALBackup before exit BackupVCDB
        # only for full backup
        if br_state.isFullVCDBBackup():
            if wal_backup_proc.is_alive():
                logger.info("Terminate sub process %s" %wal_backup_proc.pid)
                wal_backup_proc.kill()

For example:
around line 2060

BEFORE
----
    except Exception as e:
        logger.exception('Encounter error during backup VCDB.')
        UpdateExceptionStatus(e, status, logger)
    # if PG_TELEMETRY is enabled, collect backup data and send to VAC
    #    --data1: backup type(Full or Partial)
    #    --data2: backup status(Success or Fail)
    #    --data3: vCenter version
    finally:
        # Do not return an error in the event of a failure, as telemetry
        # data loss should not result in a critical failure.
        try:
            if not telemetry.check_telemetry_enabled(enableLogging=False):
                logger.info('Telemetry data is disabled')
            else:
                vc_ver = telemetry.get_vc_version()
----

AFTER
----
     except Exception as e:
         logger.exception('Encounter error during backup VCDB.')
         status.errCode = 1
         status.errMsg = 'Encounter error during backup VCDB'
         UpdateExceptionStatus(e, status, logger)
     # if PG_TELEMETRY is enabled, collect backup data and send to VAC
     #    --data1: backup type(Full or Partial)
     #    --data2: backup status(Success or Fail)
     #    --data3: vCenter version
     finally:
         # Terminate active sub process WALBackup before exit BackupVCDB
         # only for full backup
         if br_state.isFullVCDBBackup():
             if wal_backup_proc.is_alive():
                 logger.info("Terminate sub process %s" %wal_backup_proc.pid)
                 wal_backup_proc.kill()
         # Do not return an error in the event of a failure, as telemetry
         # data loss should not result in a critical failure.
         try:
             if not telemetry.check_telemetry_enabled(enableLogging=False):
                 logger.info('Telemetry data is disabled')
             else:
                 vc_ver = telemetry.get_vc_version()
----