Unable to cancel VCF upgrade when previous SDDC Manager tasks have failed with "Internal Error"
search cancel

Unable to cancel VCF upgrade when previous SDDC Manager tasks have failed with "Internal Error"

book

Article ID: 322303

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Symptoms:

  • Attempting to cancel results in the following error. 

    Cancelling bundle update failed.UPGRADE_CANCEL_CONFLICT; Failed to cancel upgrade; UpgradeId={Upgrade_ID}. Reason: Upgrade: {Upgrade_ID} cannot be cancelled in the given state: INPROGRESS as the only remaining upgrade element is in progress

  • The following meesage may be displayed:
    "UPGRADE_GET_FAILED; Failed to get upgrade; UpgradeId={Upgrade_ID}. Reason: No enum constant com.vmware.evo.sddc.lcm.model.upgrade.UpgradeStatus.FAILED",

  • VxRail Manager, SDDC Manager and the MGMT vCenter have all been rebooted with no changes - SDDC Manager still shows the upgrade in progress even though it has stopped on the VxRail Manager

  • In the log file /var/log/vmware/vcf/lcm/lcm.log the follow is seen:
    YEARDATETIME INFO [51209ef353aa0dd6,91af] [c.v.e.s.l.a.i.i.LogicalInventoryClient,pool-5-thread-5] acquired the resource lock for the resource : { "status": "FAILED", "createdTimeStamp": 0, "errorMessage": "DOMAIN ({DOMAIN_ID}) currently locked by below resources (/LCM/deployment/) having Description : Acquired DEPLOYMENT level lock for VMWARE_SOFTWARE upgrades", "resourceType": "DOMAIN" }
    YEARDATETIME WARN [51209ef353aa0dd6,91af] [c.v.evo.sddc.lcm.orch.Orchestrator,pool-5-thread-5] Cannot start upgrades since there are pending or, failed workflows
    YEARDATETIME INFO [51209ef353aa0dd6,91af] [c.v.e.s.l.e.s.i.LcmEventServiceImpl,pool-5-thread-5] Creating LCM audit event for notification for eventName: UPGRADE_CANCELLED
    YEARDATETIME INFO [51209ef353aa0dd6,91af] [c.v.e.s.l.e.s.i.LcmEventServiceImpl,pool-5-thread-5] Creating LCM audit event for notification for eventName: UPGRADE_ABORTED
    YEARDATETIME ERROR [51209ef353aa0dd6,91af] [c.v.evo.sddc.lcm.orch.Orchestrator,pool-5-thread-5] Couldn't acquire lock for the domain {DOMAIN_ID}
    YEARDATETIME INFO [51209ef353aa0dd6,91af] [c.v.evo.sddc.lcm.orch.Orchestrator,pool-5-thread-5] have acquired all the locks required for the upgrade : false

 

 

Environment

VMware Cloud Foundation 4.x

VMware Cloud Foundation 5.x

Cause

The issue has occurred because the previous task received an "Internal Error" and the lock was not released.

Resolution

 

To clear the lock follow the steps below, before proceeding ensure snapshot the SDDC Manager VM.

  1. SSH to the SDDC Manager appliance and change to root.
  2. Find the lock by reviewing the output of either of these commands:
    curl localhost/locks | jq
    OR
    psql -h localhost -U postgres -d platform -c "select * from lock;"
  3. Remove the lock from DB:

    psql -h localhost -U postgres -d platform -c "truncate lock;"
  4. Restart LCM service:
    systemctl restart lcm