Decommissioning hosts from SDDC Manager fails to put hosts into Maintenance Mode with Error: The task was interrupted
search cancel

Decommissioning hosts from SDDC Manager fails to put hosts into Maintenance Mode with Error: The task was interrupted

book

Article ID: 428293

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation

Issue/Introduction

  • When attempting to decommission an ESXi host from a workload domain in VMware Cloud Foundation (VCF), the task becomes stuck or fails in the SDDC Manager UI.
  • The host decommissioning task remains in a Deactivating state.

  • SDDC Manager UI displays the following error:

  • In /var/log/vmware/vcf/domainmanager/domainmanager.log, the following snippets are observed:

    YYYY-MM-DDTHH:MM:SS DEBUG [vcf_dm] [c.v.e.s.c.c.v.vsphere.VcManagerBase] Task: (MOR:task-######) (Name:enterMaintenanceMode) Entity: (MOR:host-##) (Name:<esxi hostname>) status: running. Waiting for its complete
    YYYY-MM-DDTHH:MM:SS INFO  [vcf_dm] [c.v.e.s.c.c.v.vsphere.VcManagerBase] Getting status of task Task: (MOR:task-######) (Name:unknown)
    YYYY-MM-DDTHH:MM:SS WARN  [vcf_dm] [c.v.v.v.c.h.i.HttpProtocolBindingBase] Asynch

Environment

  • VMware Cloud Foundation (VCF) 5.x / 9.x
  • SDDC Manager 5.x / 9.x

 

 

Cause

  • This issue typically occurs if the ESXi host is rebooted or loses connectivity while the "Enter Maintenance Mode" task is in progress.
  • A common trigger is a manual host reboot performed because a Virtual Machine (VM) was stuck and blocking the maintenance mode evacuation.
  • When the host reboots, the vCenter task is terminated abruptly, leaving the SDDC Manager "domainmanager" service waiting for a task status that no longer exists or is reported as "unknown"

Resolution

To resolve this issue, you must manually clear the stuck task from the SDDC Manager database and re-initiate the decommission process.

  1. Preparation

    1. Take a snapshot of the SDDC Manager VM before proceeding with database modifications.
    2. Log in to the SDDC Manager VM via SSH as the vcf user.
    3. Switch to the root user: su

  2. Identify and Cancel the Stuck Task

    1. Access the SDDC Manager platform database:

      psql -h localhost -U postgres -d platform

    2. Identify the ID of the stuck Decommissioning task:

      SELECT HOSTNAME, ID FROM HOST WHERE STATUS='DEACTIVATING';

      Example O/p:

      platform=# SELECT HOSTNAME, ID FROM HOST WHERE STATUS='DEACTIVATING';
             hostname       |                  id
      ----------------------+--------------------------------------
       esxi.example.com | ########-####-####-####-##########
      (1 row)

    3. Cancel the task by updating it's status to Active:

      UPDATE HOST SET STATUS='ACTIVE' WHERE ID='<InsertID from previous step>';

      Example O/p:

      platform=# UPDATE HOST SET STATUS='ACTIVE' WHERE ID='########-####-####-####-########';
      UPDATE 1
    4. Exit from the Database

      \q


    Note:
    Ensure the ESXi host is now reachable and in a healthy state (manually put it into Maintenance Mode in vCenter if necessary to ensure no VMs are blocking it).

  3. Retry Decommissioning
    1. Log back into the SDDC Manager UI.

    2. Navigate to Inventory > click on the required Domain > select the Host(s) and click on REMOVE SELECTED HOSTS