CNS tasks are hung in vCenter Server after a volume is detached from a virtual machine
search cancel

CNS tasks are hung in vCenter Server after a volume is detached from a virtual machine

book

Article ID: 431748

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

This article covers a specific corner case where CNS tasks are hung in vCenter Server after a volume is detached from a virtual machine.

Note: All the below symptoms must match to conclude the task is indeed hung. Steps 6 and 8 should match to confirm the issue. Steps 1-5, 7 shows how to get relative log messages.

  1. Task shows QUEUED status in vSphere UI for a long time.
  2. Identify the Task ID in the vsansvcmgmtd.log file.
    For eg. for the pending Task ID 1722, the below event is recorded along with a corresponding task: deletesnapshot task
    Log file: /var/log/vmware/vsan-health/vsanvcmgmtd.log
    
    YYYY-MM-DDT16:55:59.333Z INFO vsanvcmgmtd 52040 [vc@4413 sub="CnsTask" opId="2cbc21c4"] A com.vmware.cns.tasks.deletesnapshot task is created: task-1722
  3. Make a note of the opId from the above log. For eg. opId="2cbc21c4"

  4.  Using the opId 2cbc21c4, find below log entry in vsanvcmgmtd.logThe log will indicate the number of tasks in queue. For eg. 1 tasks are already in queue

    Log file: /var/log/vmware/vsan-health/vsanvcmgmtd.log
    
    YYYY-MM-DDT16:55:59.338Z INFO vsanvcmgmtd 52040 [vc@4413 sub="WorkflowManager" opId="2cbc21c4"] Delete Snapshot task conflicting with resource vm-1087. 1 tasks are already in queue
  5. Identify the VM moid from the above log file. For eg. vm-1087

  6. Using the VM moid from step 5, search the vsanvcmgmtd logs for the latest events containing, "resource vm-###". The log event MUST indicate more than 0 task in queue.

    $> grep "resource vm-1087" vsanvcmgmtd-3.log
    
    YYYY-MM-DDT15:54:20.578Z INFO vsanvcmgmtd 52067 [vc@4413 sub="WorkflowManager" opId="2cbc1b5a"] Detach volume task conflicting with resource vm-1087. 0 tasks are already in queue
    YYYY-MM-DDT15:54:20.892Z INFO vsanvcmgmtd 52121 [vc@4413 sub="WorkflowManager" opId="2cbc1b58"] invoking next workflow Detach volume pending on resource vm-1087
    YYYY-MM-DDT15:54:28.791Z INFO vsanvcmgmtd 52024 [vc@4413 sub="WorkflowManager" opId="2cbc1b78"] Update volume task conflicting with resource vm-1087. 0 tasks are already in queue
    YYYY-MM-DDT15:54:28.847Z INFO vsanvcmgmtd 51854 [vc@4413 sub="WorkflowManager" opId="2cbc1b6f"] invoking next workflow Update volume pending on resource vm-1087
    YYYY-MM-DDT16:55:59.143Z INFO vsanvcmgmtd 52131 [vc@4413 sub="WorkflowManager" opId="2cbc21c1"] Delete Snapshot task conflicting with resource vm-1087. 0 tasks are already in queue
    YYYY-MM-DDT16:55:59.338Z INFO vsanvcmgmtd 52040 [vc@4413 sub="WorkflowManager" opId="2cbc21c4"] Delete Snapshot task conflicting with resource vm-1087. 1 tasks are already in queue
    YYYY-MM-DDT16:55:59.845Z INFO vsanvcmgmtd 52132 [vc@4413 sub="WorkflowManager" opId="2cbc21be"] invoking next workflow Delete Snapshot pending on resource vm-1087
  7. Identify the latest opId from step 6. For eg.  YYYY-MM-DDT16:55:59.845Z INFO vsanvcmgmtd 52132 [vc@4413 sub="WorkflowManager" opId="2cbc21be"] invoking next workflow Delete Snapshot pending on resource vm-1087
  8. Using the opId from step 7, find the below entry in vsanvcmgmt logs, this MUST be a detach operation. For eg. opId 2cbc21be - Invoking 'detach' on 'cns-volume-manager'
    YYYY-MM-DDT16:55:58.846Z INFO vsanvcmgmtd 52041 [vc@4413 sub="AdapterServer" opId="2cbc21be"] Invoking 'detach' on 'cns-volume-manager' session '528a5549-302b-2590-8b4f-a1d85110e303' active 1/1


Environment

vSphere 8.0 update 3 and later

Cause

In rare circumstances, CNS tasks are in a hung state as detaching a volume triggers a status change that clears the VM's task queue prematurely.

Resolution

This issue will be fixed in a future release of vCenter Server.

To work around the issue, manually restart the vsan-health service on vCenter Server. 

Note: Restarting the vsan-health service will force queued tasks to fail in vCenter Server. As a result of queued tasks failing in vCenter Server, all on-going k8s tasks may be forced to fail as well.

Restart the vsan-health service using the below command:

$ service-control --stop vmware-vsan-health && service-control --start vmware-vsan-health