Jobs and tasks are scheduled, but it is taking a long time for clients to pick them up
search cancel

Jobs and tasks are scheduled, but it is taking a long time for clients to pick them up

book

Article ID: 174469

calendar_today

Updated On:

Products

IT Management Suite Client Management Suite

Issue/Introduction

Scheduled jobs and tasks take a long time for clients to pick them up.

When looking at the task itself on the SMP Console, it appears as "Queued".

Targeted client machines don't receive an update that there is a new task to run. Anything else that was previously scheduled runs just fine. However, if restarting the Symantec Management Agent service, the task is received and executed.

This is happening randomly and with no specific pattern.  Usually, a very small number of client machines are affected by this.

Environment

ITMS 8.1 RU7 and earlier versions of 8.5 (prior to RU3)

Cause

Task execution relies on a tickle connection between the Symantec Management Platform (SMP), remote Task Server (TS) and a Symantec Management Agent (SMA) on a client. When a task is scheduled, SMP sends a tickle packet and task definition to the Task Server which has registration record for that specific client. Then TS sends a tickle packet to the client with notification 'I have a task for you, please take it'. Symptoms above are for the most cases when the tickle connection between the task server and client does not work. Without the tickle connection, the client will ask TS periodically "maybe you have a task to me?" - by default, such request will be done once in 30 minutes (controlled by Client Task Agent (CTA) policy on SMP under Settings>Notification Server>Task Settings>Task Agent Settings, "Check Task Server for new task every:" (by default it is 30min)).

The client establishes a tickle connection while registering on a task server, so it will be restored in case of SMA restart or in case if you will press 'Reset Agent' button in Task Status tab of SMA UI.

A tickle connection might be broken by:

  1. A restart of task server service (atrshost service) on a remote Task Server
  2. Firewall
  3. Problems with the network adapter on a task-server or on a client
  4. SMP is not aware that the client machine is actually registered to a Task Server so it doesn't assign the task to it

In this particular instance, the following was also noticed:

  1. We look at some of those client machines and those were "assigned to a task server" under the Task Status tab
  2. Looking at the database, the following query will show servers that have agents in this state:
select 
(select replace (replace (SUBSTRING (cta.Properties, CHARINDEX ('<property name="LastServer">', cta.Properties) + 28, 100), '</property>', ''), '</properties>', '') from ClientTaskAgentPersistentSettings cta where cta.ResourceGuid = ti.ResourceGuid) LastRegisteredTaskServer
from TaskInstances ti
join vRM_Computer_Item c on c.Guid = ti.ResourceGuid
join vRM_Computer_Item c2 on c2.Guid = ti.TaskServerGuid
left join Inv_Client_Task_Resources ctr on ctr._ResourceGuid = ti.ResourceGuid
where 1=1
and ti.EndTime is null
and ctr._ResourceGuid is null
  1. We went to one of the task servers to which the agent was registered, and we restarted the "Client Data Loader" service.
  2. After that, the client machines in the queued state got the task and ran it.

Looks like the quick workaround is to restart the "Client Data Loader" (which also restart the Altiris Object Host service) on the affected task server and things go to normal for a while.

That is why when originally the customer was restarting the SMA service, the task was received and executed because it reset the task server connection, forcing the SMP to recognize that there was a task server assigned to the client machine.

Resolution

This issue has been addressed in our most recent releases for ITMS 8.5 (small improvements were done starting with RU1). This scenario is one of the multiple areas that Symantec Development team tried to improve in the ITMS 8.5 release in regards to Task Management:
1. 8.5 includes more stability with CTDataloader and AtrsHost services
2. 8.5 fixed a problem regarding the IP address check, that was not letting the clients establish a tickle connection.


So, try upgrading to ITMS 8.5 RU3 or later since further enhancements have been added to Task services stability.

It appeared that the client machines were not registered to a task server from the SMP perspective, even when the client machines themselves said that were registered to it.

The current workaround for 8.1 RU7 (or earlier) and pre-8.5 RU3 is to restart the Altiris Client Data Loader (CTDataloader) service in the assigned Task Server to the machines that are affected (the last restarts AtrsHost service as well).

As well, the attached "Restart potentially partially hung Task Servers with non registered clients.zip" Automation policy can be imported to automate restarting the required service if the situation is detected server-side:

  1. Download and extract the attached "Restart potentially partially hung Task Servers with non registered clients.zip"
  2. Import the custom Service Control Task called "Restart Task Server Services Control Service.xml" into Manage>Jobs and Tasks. This task restarts the "ctdataloader" service with System rights.
  3. Import the custom Automation policy called "Restart potentially partially hung Task Servers with non registered clients.xml" into Manage>Automation policies.
    Note: If you want to know if this restart task server services ran, you can check on the agent logs on that Task Server for an entry like this:

    Operation started: Stopping the service Altiris Client Task Data Loader ...
    -----------------------------------------------------------------------------------------------------
    Date: 1/15/2020 10:24:03 AM, Tick Count: 990092326 (11.11:01:32.3260000), 
    Process: CTDataLoad.exe (2044), Thread ID: 3696, Module: CTDataLoad.exe

    The service Altiris Object Host Service is running, stopping it ...
    -----------------------------------------------------------------------------------------------------
    Date: 1/15/2020 10:24:03 AM, Tick Count: 990092341 (11.11:01:32.3410000),
    Process: CTDataLoad.exe (2044), Thread ID: 2128, Module: CTDataLoad.exe

Since this Task Server services are restarted, you should try again to send the desired tasks if the status of "Queued" didn't change. The next execution round for the desired task should now return the proper execution status.
 

Note:

Few things to keep in mind in case you need to troubleshoot this issue if the workaround provided doesn't help you:

Collect verbose logging on SMP, TS, and SMA. Enable extended logging for Task Management/Task Server:

  1. Enable verbose logging on SMP and on remote TS, and on problematic client

  2. Enable extended logging on SMP and on remote TS using attached file "ExtendedLoggingv2.7z"

  3. Reproduce problem with a problematic client

  4. Collect the logs from the machine and specify the task instance GUID for investigation (Step 1) – from all computers: SMP, Remote TS, problematic client

  5. Reset Client Task Agent registration on problematic client (Agent UI -> Task Status->Reset Agent)

  6. Reproduce problem with a problematic client

  7. Collect the logs from the machine and specify the task instance GUID for investigation (Step 2) – from all computers: SMP, Remote TS, problematic client

  8. Disable extended logging on SMP and on remote TS using attached file "ExtendedLoggingv2.7z"

Attachments

1619034883653__Restart potentially partially hung Task Servers with non registered clients.zip get_app
ExtendedLoggingV2.zip get_app
ExtendedLoggingV2.7z get_app