Backup or restore fails because it's already running
search cancel

Backup or restore fails because it's already running

book

Article ID: 430312

calendar_today

Updated On:

Products

VCF Automation

Issue/Introduction

Backup or restore request fails with message "Backup/restore is already running, waiting to retry". Subsequent backup/restore requests cannot run for up to 1 hour.

Environment

VCF 9.0

Cause

Backup and restore workflows acquire a lock to ensure only a single backup/restore process can run at the same time. This lock is released on success or failure. However, in some rare scenarios the process is terminated before the lock can be cleaned up. This lock has expiry of 1 hour, so when this happens, backup/restore flows cannot run until the lock expires.

Resolution

 If 1-hour delay before the next backup/restore execution is not acceptable, the lock can be cleaned up manually. Before executing the following steps, make sure no backup/restore workflows are running.

  1. Identify one of the VMs that belong to VCF Automation or Identity Broker and locate its IP address.
  2. SSH into that VM and delete the lock
 
ssh vmware-system-user@<node ip> sudo -i export KUBECONFIG=/etc/kubernetes/admin.conf kubectl delete cm vmsp-backup-state -n vmsp-platform

   3. Run backup/restore.