Taskinstances table grows rapidly
search cancel

Taskinstances table grows rapidly

book

Article ID: 152933

calendar_today

Updated On:

Products

IT Management Suite Audit Integration Component

Issue/Introduction

The Symantec_CMDB dataBase grows fast, all tasks tables grow quickly, in the log viewer you see several of the following errors associated with the cleanup task:

Error deleting: Altiris.TaskManagement.Data.TaskExecutionInstanceNotFoundException: Unable to find task instance 00000000-0000-0000-0000-000000000000 in the database.

The above error is showing up several times a second in the a.log files, filling the logs.  This error almost always shows up with two other errors, one of which indicates a Deadlock.  However, the SQL server is showing no locking at all.

Process: AtrsHost (2068)
Module: AtrsHost.exe
Source: Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks
Description: AltirisSqlHelper.RepeatForDeadlocks(): Non-deadlock exception: Altiris.TaskManagement.Data.TaskExecutionInstanceNotFoundException: Unable to find task instance 00000000-0000-0000-0000-000000000000 in the database.
   at Altiris.TaskManagement.Data.TaskExecutionInstance.GetTaskInstance(TaskInstanceGuid taskInstanceGuid)
   at Altiris.TaskManagement.Data.TaskExecutionInstance.<>c__DisplayClass2.<DeleteTaskInstance>b__1(DatabaseContext ctx, Object state)
   at Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks(Int32 retries, Int32 sleep, Object state, RepeatForDeadlocksDelegate func)

Process: AtrsHost (2068)
Module: AtrsHost.exe
Source: Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks
Description: AltirisSqlHelper.RepeatForDeadlocks(): Failed all retries

Process: AtrsHost (2068)
Module: AtrsHost.exe
Source: Altiris.TaskManagement.Maintenance.CleanupTaskDataTask.DeleteExcessWorkingData
Description: CleanupTaskDataTask.DeleteExcessWorkingData(): Error deleting: Altiris.TaskManagement.Data.TaskExecutionInstanceNotFoundException: Unable to find task instance 00000000-0000-0000-0000-000000000000 in the database.
   at Altiris.TaskManagement.Data.TaskExecutionInstance.GetTaskInstance(TaskInstanceGuid taskInstanceGuid)
   at Altiris.TaskManagement.Data.TaskExecutionInstance.<>c__DisplayClass2.<DeleteTaskInstance>b__1(DatabaseContext ctx, Object state)
   at Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks(Int32 retries, Int32 sleep, Object state, RepeatForDeadlocksDelegate func)
   at Altiris.TaskManagement.Data.TaskExecutionInstance.DeleteTaskInstance(TaskInstanceGuid taskInstanceGuid)
   at Altiris.TaskManagement.Maintenance.CleanupTaskDataTask.DeleteExcessWorkingData(WaitHandle eventStop).

Environment

ITMS 7.x, 8.x

Cause

This issue is caused by following two reasons: 

  1. The clients have an old task that they are trying to upload status for. It results in spamming the server with the results from an old or bogus task causing the server to fill up with "junk" data.
  2. There is a problem with the cleanup task getting data in the tables out of sync then getting to a point where it just fails.  Any high-load on the NS server or high I/O disk values on the SQL (if off-box) should be considered specially while running the work around.

Resolution

Possible workarounds as below: 
 

1- Stop and change the Cleanup Task

What the weekly change is for, is to prevent the errors from showing up again, except once/week. Each time it runs, you should check to see if it ran successfully, and if not, stop it and delete it again. You'll need to continue doing this until a fix can be found.
 
If task is disabled instead of cleanup, then NO data will be purged, and the table will grow unexpectedly. Perform cleanup on weekly basis. If issue continues, running it manually or daily is recommended.
To help prevent the on demand clean up task from running change the cleanup options and set the maximum number of working database rows to 1 Million.
 

2- Manually Delete the Data

Note: Truncating these tables will cause all task history to be removed. If there is task history data that is needed, it must be retrieved before truncating the tables.
 
Note: You may have to stop the Altiris Object host and Dataloadeder services if you are having problems truncating the tables.
  • Find the Cleanup Task Data task, Settings>All Settings>Notification Server>Task Settings - Cleanup Task Data. Find all running instances (not the one that says "Pending" under the Status), and stop them. About every minute, refresh this page until they all show a big red X.
  • At that point, delete the run instances. By this time, you'll notice that the logs are no longer filling with errors.
  • Now, modify the schedule (the one that says pending) to run weekly instead of daily.
Delete data in the database by truncating the following Task Tables:
 
i.e.  Use the following SQL command: Truncate table (Task Table name)
 
If OBJECT_ID('TaskInstanceSummaries', N'U') is not null
Truncate table TaskInstanceSummaries
If OBJECT_ID('TaskInstanceResults', N'U') is not null
Truncate table TaskInstanceResults
If OBJECT_ID('TaskInstances', N'U') is not null
Truncate table TaskInstances 
If OBJECT_ID('TaskInstancesCompleted', N'U') is not null
Truncate table TaskInstancesCompleted 
If OBJECT_ID('TaskInstancesIncomplete', N'U') is not null
Truncate table TaskInstancesIncomplete
If OBJECT_ID('TaskinstanceParents', N'U') is not null
Truncate table TaskinstanceParents
If OBJECT_ID('TaskInstancesStarted', N'U') is not null
Truncate table TaskInstancesStarted
If OBJECT_ID('TaskInstanceStatus', N'U') is not null
Truncate table TaskInstanceStatus
If OBJECT_ID('TaskOutputPropertyValue', N'U') is not null
Truncate table TaskOutputPropertyValue
If OBJECT_ID('TaskInstanceEvents', N'U') is not null
Truncate table TaskInstanceEvents
If OBJECT_ID('TaskInstanceChildEvents', N'U') is not null
Truncate table TaskInstanceChildEvents  
If OBJECT_ID('TaskInstanceExecutionInfo', N'U') is not null
Truncate table TaskInstanceExecutionInfo
If OBJECT_ID('TaskInstanceJobNodes', N'U') is not null
Truncate table TaskInstanceJobNodes
If OBJECT_ID('TaskInstanceresultSummaries', N'U') is not null
Truncate table TaskInstanceresultSummaries
If OBJECT_ID('ServerTaskInstancerequests', N'U') is not null
Truncate table ServerTaskInstancerequests
If OBJECT_ID('ClientTaskInstancerequests', N'U') is not null
Truncate table ClientTaskInstancerequests
If OBJECT_ID('TaskOutputProperty', N'U') is not null
Truncate table TaskOutputProperty
 
 
3- Run a Cleanup Script on Clients
  • Create a New Run Script task, Manage>Jobs and Tasks>System Jobs and Tasks>Software>right click Patch Management select New then Task. Scroll down and find Run Script, rename then copy and paste the following syntax in to the box. Then click OK, this will save the Task.
  • Scroll down to New Client Job, click Add Existing. Select the Run Script you just created, click OK. Uncheck the box-Fail Job if any Task fails, Save Changes. Click New Schedule, at this point you can put in a schedule or run it Now. Note: Only run this job once, No Repeat. Click Add and put in your Target or Computers. Click the Schedule button and you are done. It runs according to what you just set.
Note: The Cleanup Script below does have a pop-up window the user will see. Pop-up will be up 1-10 seconds, then runs in the background. Runs up to 2 minutes, and the agent will disappear in this process.
           Process completed when the agent is visible again
 
Cleanup Script 
---------------------------------------------------------

REM Modify the run version if this script needs to be ran again on a machine that has already executed this script.
REM This value is in place to prevent task looping
Set RunVer=1

REM Get the Altiris Agent install path
FOR /F "tokens=2*" %%A IN ('REG.EXE QUERY "HKLM\Software\Altiris\Altiris Agent" /V "installdir"') DO SET AgentDir=%%B
set tempbat=%temp%\AgentClean.bat"

FOR /F "tokens=2*" %%A IN ('REG.EXE QUERY "HKLM\Software\Altiris\Altiris Agent" /V "CleanupVer"') DO SET CleanupVer=%%B
if Ver%RunVer%==Ver%CleanupVer% exit

REM add Run value to registry to prevent looping incase task cannot report success to server
reg add "HKLM\Software\Altiris\Altiris Agent" /v CleanupVer /t REG_SZ /d %RunVer% /f

REM Create temporary batch file to execute while the agent restarts
echo ping localhost -n 30 > %tempbat%
echo "%AgentDir%\aexagentutil" /stop >> %tempbat%
echo rmdir "%AgentDir%\TaskManagement" /s /q >> %tempbat%
echo ping localhost -n 30 >> %tempbat%
echo "%AgentDir%\aexagentutil" /start >> %tempbat%
echo exit >> %tempbat%
REM Executes temporary batch file
start "" /MIN %tempbat%

---------------------------------------------------------
 
Now find the machines which are spamming the server using task results. To do this, first of all you have to find what results are frequently sent to the NS by those client machines. Run the following SQL query to see what Tasks have had data sent up to the server in the last 24 hours.
*Stop the service "Altiris Client Task Data Loader" and "Altiris Support Service".
*Stop all the running "Cleanup Task Data Daily" from Altiris Console>Settings>Notification Server>Task Settings>Cleanup Task Data.
*Run Cleanup task data manually with the attached SQL query.
*Run the stopped services "Altiris Client Task Data Loader" and "Altiris Support Service".
*Change the schedule time of the "Cleanup Task Data Daily" to few minutes later to re associated the task with the windows schedule.
By following these steps, the error might get reproduced when you run the services again as there is a regular check for the "taskinstances" table not to exceed 250000 rows causes an "On-Demand Cleanup Task Data". You have to stop it from the console, start the Cleanup Task Data from the console (right click > Start Now), if the error appears again just stop the running Cleanup Task Data, and re-run it when you see that it returned a value of success or fail, this should be performed until all of the failed Cleanup Task Data tasks disappear, and be sure that it can run without problem on its scheduled time.
----------------------------------------------------------------------------------
If Null values are found frequently under following query results, it will confirm that there is a problem
select max(i.name) as Name,TaskVersionGuid, count(*) As count from taskinstances ti
left join itemversions iv
on ti.TaskVersionguid = iv.Versionguid
left join item i
on iv.Itemguid = i.guid
join Taskinstanceresults tir
on tir.TaskInstanceGuid = ti.TaskInstanceGuid
where tir.endtime > getdate() -1
group by TaskVersionGuid
order by count DESC
-------------------------------------------------------
If the query results show that there is NULL value in their name, then it will confirm that it is likely to be problematic task. Please copy that TaskVersionGuid and put it in for the following Query
-------------------------------------------------------
declare @TaskVersionGuid uniqueidentifier
set @TaskVersionGuid='XXXXXX-XXXXXXXX-XXXXXXXX'

select max(vc.name) as Name, ResourceGuid, Count(*) as Count from TaskInstances ti
join vcomputer vc
on ti.ResourceGuid = vc.Guid
where ti.TaskVersionGuid = @TaskVersionGuid
group by ResourceGuid
order by count DESC
--------------------------------------------------------------
Run the above script on all affected machines.

Additional Information

177057 "The TaskOutputPropertyValue table is growing very large"

176114 "The TaskOutputProperty table is getting very large, how can we clean it up?"