In vRealize Automation 7.x HA mode - machine deployments are stuck in "Requested" state - Provisioning or data collection requests are stuck in "In progress" status
search cancel

In vRealize Automation 7.x HA mode - machine deployments are stuck in "Requested" state - Provisioning or data collection requests are stuck in "In progress" status

book

Article ID: 314778

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

vRealize Automation 7.0, 7.0.1, 7.1, and 7.2

  • Provisioning or data collection requests are stuck in "In progress" status.
  • The load balancer monitor may or may not be displaying more than one MS nodes as active.
  • Machine deployments get stuck in "Requested" state
  • The logs indicate that more than one Manager Service is active at the same time. This could be verified by examining a passive Manager Service log. You may see messages similar to the following after the node has entered the passive state:
[UTC:2017-09-19 21:00:35 Local:2017-09-19 17:00:35] [Info]: [sub-thread-Id="6" context="" token=""] Successfully marked current node as passive in the database....
[UTC:2017-09-19 21:02:38 Local:2017-09-19 17:02:38] [Debug]: [sub-thread-Id="12" context="" token=""] DC: Created data collection item, WorkflowInstanceId 195158, Task state, EntityID <Entity_ID>, StatusID = <Status_ID>

vRealize Automation 7.3

  • Machine deployments get stuck in "Requested" state
  • Event broker subscriptions do not go past VMPSMasterWorkflow32.Requested phase PRE event state
  • IaaS Manager service recently was failed over


Environment

VMware vRealize Automation 7.x

Cause

vRealize Automation 7.0, 7.0.1, 7.1, and 7.2

Multiple Manager Services running causes a race condition.  This is generally caused by user error on installation or when a systems admin attempts to manually failover the services without properly stopping the problematic node.

Note: Manager Service automatic failover is introduced in vRealize Automation 7.3.

vRealize Automation 7.3

In vRealize Automation 7.3 upon IaaS Manager failover - the passive Manager Service is not able to stop one or more of the scheduled operations it manages (e.g., Data Collection) and the operations continue to be executed after the Manager Service node has entered the passive state. This results in multiple nodes racing for the same tasks, which leaves some of those tasks in an inconsistent state.

Resolution

vRealize Automation 7.0, 7.0.1, 7.1, 7.2, and 7.3

The automatic manager service failover trigger for this issue is resolved in vRealize Automation 7.3.1 and above, available at VMware Downloads.

Although one of the specific triggers for this issue is resolved in 7.3.1,7.4. Specifically the automatic manager service failover logic.

The scenario of the IaaS workflow engine becoming overloaded can still occur due to other triggers, for example if the manager service is manually started on the secondary IaaS manager node when automatic failover is not enabled.

In which case the same workaround below can still be followed.


Workaround:

vRealize Automation 7.0, 7.0.1, 7.1, and 7.2

To work around the issue:  Stop the passive Manager Service and restart the active one.

If swapping the manager service components does not resolve the stuck in requested issue, follow the SQL instructions located below for vRealize Automation 7.3.

vRealize Automation 7.3

To work around the issue: 

  1. Run the following SQL query against IAAS SQL Server database. The query lists all VM's stuck in "Requested" status in descending creation date order. These VM's need to be deleted from IAAS SQL database in order to free IAAS workflow engine.
SELECT vm.VirtualMachineId, vm.VirtualMachineName, vm.VirtualMachineState, vm.RecCreationTime, vm.VMCreationDate
FROM InstanceState ins
JOIN VirtualMachine vm ON (ins.uidInstanceID = vm.VirtualMachineID)
WHERE vm.VirtualMachineState = 'Requested'
ORDER BY vm.VMCreationDate desc
  1. Stop all IAAS services (Manager service, DEM Orchestrator, DEM worker, and proxy agent etc.)
  2. BACKUP any related appliances and databases (both IAAS SQL Server database and VRA database) to allow a restore if there is any problem on running steps below
  3. Compile attached usp_RemoveVMFromVRA.sql against IAAS SQL server database.
  4. Open the attached clear_requested_vm.sql in SQL query window and run the query to remove VMs stuck in "Requested" status.
  5. Restart all IAAS services.
Install the hotfix attached to this article (Applicable to 7.3.0 GA only)
  1. Download hf-1988761.7z
  2. Unzip / extract the file contents to the Windows IaaS Manager Service component systems
  3. 3 total files need replaced:
    • DynamicOps.Core.Common.dll,
    • DynamicOps.VMPS.CommonRuntime.dll
    • DynamicOps.VRM.DataCollectionService.dll
  4. Create backups of the old files by either renaming the file or moving to a backup directory in the Windows file system
  5. Replace all 3 files with the extracted files.
  6. Restart the Manager Service.
Note: "DynamicOps.Core.Common.dll" is located in multiple folders within "<IaaS installation location>\VMware\vCAC".
Although the change is related to the Manager Service only, ensure every occurence of "DynamicOps.Core.Common.dll" is updated.

The IaaS workflow engine should pick up new requests now.


Attachments

usp_RemoveVMFromVRA.sql get_app
clear_requested_vm.sql get_app
hf-1988761.7z.gz get_app