Scheduled jobs and tasks take a long time for clients to pick them up.
When looking at the task itself on the SMP Console, it appears as "Queued".
Targeted client machines don't receive an update that there is a new task to run. Anything else that was previously scheduled runs just fine. However, if restarting the Symantec Management Agent service, the task is received and executed.
This is happening randomly and with no specific pattern. Usually, a very small number of client machines are affected by this.
ITMS 8.1 RU7 and earlier versions of 8.5 (prior to RU3)
Task execution relies on a tickle connection between the Symantec Management Platform (SMP), remote Task Server (TS) and a Symantec Management Agent (SMA) on a client. When a task is scheduled, SMP sends a tickle packet and task definition to the Task Server which has registration record for that specific client. Then TS sends a tickle packet to the client with notification 'I have a task for you, please take it'. Symptoms above are for the most cases when the tickle connection between the task server and client does not work. Without the tickle connection, the client will ask TS periodically "maybe you have a task to me?" - by default, such request will be done once in 30 minutes (controlled by Client Task Agent (CTA) policy on SMP under Settings>Notification Server>Task Settings>Task Agent Settings, "Check Task Server for new task every:" (by default it is 30min)).
The client establishes a tickle connection while registering on a task server, so it will be restored in case of SMA restart or in case if you will press 'Reset Agent' button in Task Status tab of SMA UI.
A tickle connection might be broken by:
In this particular instance, the following was also noticed:
select
(select replace (replace (SUBSTRING (cta.Properties, CHARINDEX ('<property name="LastServer">', cta.Properties) + 28, 100), '</property>', ''), '</properties>', '') from ClientTaskAgentPersistentSettings cta where cta.ResourceGuid = ti.ResourceGuid) LastRegisteredTaskServer
from TaskInstances ti
join vRM_Computer_Item c on c.Guid = ti.ResourceGuid
join vRM_Computer_Item c2 on c2.Guid = ti.TaskServerGuid
left join Inv_Client_Task_Resources ctr on ctr._ResourceGuid = ti.ResourceGuid
where 1=1
and ti.EndTime is null
and ctr._ResourceGuid is null
Looks like the quick workaround is to restart the "Client Data Loader" (which also restart the Altiris Object Host service) on the affected task server and things go to normal for a while.
That is why when originally the customer was restarting the SMA service, the task was received and executed because it reset the task server connection, forcing the SMP to recognize that there was a task server assigned to the client machine.
This issue has been addressed in our most recent releases for ITMS 8.5 (small improvements were done starting with RU1). This scenario is one of the multiple areas that Symantec Development team tried to improve in the ITMS 8.5 release in regards to Task Management:
1. 8.5 includes more stability with CTDataloader and AtrsHost services
2. 8.5 fixed a problem regarding the IP address check, that was not letting the clients establish a tickle connection.
So, try upgrading to ITMS 8.5 RU3 or later since further enhancements have been added to Task services stability.
It appeared that the client machines were not registered to a task server from the SMP perspective, even when the client machines themselves said that were registered to it.
The current workaround for 8.1 RU7 (or earlier) and pre-8.5 RU3 is to restart the Altiris Client Data Loader (CTDataloader) service in the assigned Task Server to the machines that are affected (the last restarts AtrsHost service as well).
As well, the attached "Restart potentially partially hung Task Servers with non registered clients.zip" Automation policy can be imported to automate restarting the required service if the situation is detected server-side:
Operation started: Stopping the service Altiris Client Task Data Loader ...
-----------------------------------------------------------------------------------------------------
Date: 1/15/2020 10:24:03 AM, Tick Count: 990092326 (11.11:01:32.3260000),
Process: CTDataLoad.exe (2044), Thread ID: 3696, Module: CTDataLoad.exe
The service Altiris Object Host Service is running, stopping it ...
-----------------------------------------------------------------------------------------------------
Date: 1/15/2020 10:24:03 AM, Tick Count: 990092341 (11.11:01:32.3410000),
Process: CTDataLoad.exe (2044), Thread ID: 2128, Module: CTDataLoad.exe
Since this Task Server services are restarted, you should try again to send the desired tasks if the status of "Queued" didn't change. The next execution round for the desired task should now return the proper execution status.
Note:
Few things to keep in mind in case you need to troubleshoot this issue if the workaround provided doesn't help you:
Collect verbose logging on SMP, TS, and SMA. Enable extended logging for Task Management/Task Server:
Enable verbose logging on SMP and on remote TS, and on problematic client
Enable extended logging on SMP and on remote TS using attached file "ExtendedLoggingv2.7z"
Reproduce problem with a problematic client
Collect the logs from the machine and specify the task instance GUID for investigation (Step 1) – from all computers: SMP, Remote TS, problematic client
Reset Client Task Agent registration on problematic client (Agent UI -> Task Status->Reset Agent)
Reproduce problem with a problematic client
Collect the logs from the machine and specify the task instance GUID for investigation (Step 2) – from all computers: SMP, Remote TS, problematic client
Disable extended logging on SMP and on remote TS using attached file "ExtendedLoggingv2.7z"