VCF Fleet Management restore failure due to vrlcm Service Stop Script Issue 'Stopping vrlcm service...'
search cancel

VCF Fleet Management restore failure due to vrlcm Service Stop Script Issue 'Stopping vrlcm service...'

book

Article ID: 428583

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

  • The VCF Fleet Management appliance restore operation failed while restoring from a valid backup.
  • Even after waiting for more than 24 hours, the process did not proceed further.
  • Logs (lcm-restore.log) showed the restore halting during the service stop phase.
    Stopping vrlcm service...
  • vmware_vrlcm.log displayed "errorCause" : null, providing no explicit failure reason.
  INFO vrlcm[1249] [http-nio-8080-exec-9] [c.v.v.l.l.u.RequestSubmissionUtil] – ++++++++++++++++++ Creating request to Request_Service :::>>> {
 "vmid" : "vm_task_id",
 "transactionId" : null,
 "tenant" : "default",
 "requestName" : "lcmvarestore",
 "requestReason" : "Trigger LCM VA Restore",
 "requestType" : "lcmvarestore",
 "requestSource" : null,
 "requestSourceType" : "user",
 "inputMap" : {
 "backupArchivePath" : "/data/lcm-backup-<data_time>.tar.gz"


 INFO vrlcm[1249] [scheduling-1] [c.v.v.l.a.c.EventProcessor] – INITIALIZING NEW EVENT :: {
 "vmid" : "vm_task_id",
 "transactionId" : null,
 "tenant" : "default",
 "createdBy" : "root",
 "lastModifiedBy" : "root",
 "createdOn" : 1769058819225,
 "lastUpdatedOn" : 1769058819801,
 "version" : "9.0.0.0",
 "vrn" : null,
 "eventName" : "OnStart",
 "currentState" : null,
 "eventArgument" : "{\"productSpec\":{\"name\":\"productSpec\",\"type\":\"com.vmware.vrealize.lcm.domain.ProductSpecification\",\"value\":\"{\\\"symbolicName\\\":\\\"lcmvarestore\\\",\\\"displayName\\\":null,\\\"productVersion\\\":null,\\\"priority\\\":0,\\\"dependsOn\\\":[],\\\"components\\\":[{\\\"component\\\":{\\\"symbolicName\\\":\\\"lcmvarestore\\\",\\\"type\\\":null,\\\"componentVersion\\\":null,\\\"properties\\\":{
 \\\"backupArchivePath\\\":\\\"/data/lcm-backup-<data_time>.tar.gz\\\",\\\"isVcfUser\\\":\\\"true\\\"}},\\\"priority\\\":0}]}\"}}",
 "status" : "CREATED",
 "stateMachineInstance" : "######-####-####-####-######",
 "errorCause" : null,
 "sequence" : 336,
 "eventLock" : 1,
 "engineNodeId" : "fleetmgmt_node_fqdn"
 } 

 

Environment

VCF Operations 9.0.x

 

Cause

The restore script attempts to stop the vrlcm-server.service even when the service is already stopped. This results in the systemd unit returning a non-zero exit code, causing the restore script to halt and preventing the restore operation from moving forward.

Resolution

Broadcom is aware of the issue affecting the restore script and is planning to fix it in a future update.

Workaround:
  1. Take a snapshot (without memory) of the Fleet Management VM
  2. SSH to Fleet management node and stop the vrlcm-server from CLI
    systemctl stop vrlcm-server.service

  3. Comment out the line like below in this script /var/lib/vlcm-common/lcm-restore.sh
    systemctl stop vrlcm-server.service -->  #systemctl stop vrlcm-server.service
  4. Manually start the restore from CLI by calling the script as below:
    /var/lib/vlcm-common/lcm-restore.sh /data/lcm-backup.tar.gz
  5. The restore process can be monitored from the same SSH session. During the restore, you will observe services such as vpostgres, nginx, and vrlcm being restarted.
  6. Once the restore is completed, ITS IMPORTANT, to do a Inventory Sync of VCF Operations from Fleet Management - Life Cycle tab
    Note: This step is critical to ensure the certificate is properly updated between VCF Operations and Fleet Management. Skipping it will cause a communication failure between VCF Operations and Fleet Management.
  7. Once the Inventory sync is completed, this completes the restore operations and we are good to start using Fleet Management