Troubleshooting NSE processing issues
search cancel

Troubleshooting NSE processing issues

book

Article ID: 205352

calendar_today

Updated On: 05-22-2025

Products

IT Management Suite Client Management Suite

Issue/Introduction

The following article is based on the following use cases:

  1. There are a large amount of NSEs accumulating in the EvtQueue folder
  2. The NSEs are not processing fast enough
  3. You want to know where these NSEs are coming from

Environment

ITMS 8.x, 8.7.x, 8.8

Resolution

ITMS 8.5 RU3 and 8.5 RU4 have added multiple improvements on NSE processing stability and performance. If you haven't upgraded to the most recent version of ITMS and NSE processing issues is a common problem, we recommend that you upgrade and take advantage of those improvements.

Also, in our ITMS 8.6 version, our Dev team made additional changes. Currently, to find candidates to process, "eventengine" uses stored procedures to pick the single oldest NSE per computer, then another oldest NSE for another computer, and so on. This is quite an expensive SQL query. In the new version, SMP will retrieve all NSEs for the computer and will perform process ordering at the application-level reducing impact on SQL.

As well, in our ITMS 8.8 version, our Dev team included further enhancements in order to have a better picture of what could be triggering all those incoming events. 

The following are suggestions on how you could troubleshoot most NSE processing issues. 

Background:

The most common reason for issues with NSE processing is that for some reason Client machines may be sending too many NSEs at once. Most cases could be related to very aggressive Inventory policies (sending delta or full inventory too frequently) or that many machines were not connected to the internal network for a while (because they not using Cloud-enabled Management (CEM) or some other network issue with agent connectivity causing the NSEs to accumulate in the local queue folder (under ...\program files\Altiris\Altiris Agent\Queue)) and as soon as these machines connect, they try to send everything that they were holding.

Suggestions:

1. Understand what are those NSEs and where are coming from.
With this, try to identify what type of NSE those are: Basic Inventory, Hardware Inventory, login/logoff events, etc., as well if those are coming from certain machines.

There are two ways to identify these incoming NSEs:

Using "Event Data Analytics":


With the ITMS 8.8 Release, there is a new feature for System Health: Metadata Statistics (for Event Data Analytics). There are new reports that should help you to narrow down some patterns and what policies may need some adjustments.

Refer to "Using Event Data Analytics for understanding SMP Server performance


Using SSETools:

You can use SSETools "NSE diagnostics", which can help you to see the NSE type (displayed under Scenario Counts) and from what machines (under Resource counts). 


Note: SSE tools only analyze file events in EvtQueue. However, this is just a small fraction of all events. Smaller NSEs are kept directly in the database as inline messages.

As well, you can use: Evaluating NSE data using SQL when a deeper analysis is needed to obtain the desired information.

Note:
In some situations, where too many NSEs are received but those are been processed faster than you can actually have a chance to review them, you can capture them and save a copy of them in a different folder. Please refer to Capture processed NSEs on the Notification Server.
As well, you can capture "bad" NSEs that are been ignored. See: How do I get the Bad NSE folders to collect bad NSEs for review again?

If you prefer to use the information available on the database, you can use queries like the ones below to show you want may be happening:

--Average NSE count
declare @compcount as int = (select count (*)  from vcomputer)

select ItemName, count (ResourceGuid) / @compcount
from Evt_NS_Event_History h
where _eventTime >= GETDATE ()-1
group by ItemName
order by 2 desc

--Find what machines have the most NSEs (above 500)
select c.Name, [Source], count (*) from EventQueueEntry e

join vRM_Computer_Item  c on c.Guid = e.[Source] group by c.Name, [Source]
having count (*) > 500
order by 3 desc

--looking for machines and totals as far as NSEs go for a specific period of time:

select distinct c.Guid, c.Name, count(*) EventCount
from Evt_NS_Event_History h
join vRM_Computer_Item c on c.Guid = h.ResourceGuid
where 1 = 1
and h._eventTime between '2024-12-22 06:00:00.00' and '2024-12-23 11:00:00.00' 
group by c.Guid, c.Name
order by 3 desc

--Take one of the GUIDs from the previous query with the highest counts:

select _eventTime, ItemGuid, ItemName, ResourceName
from  Evt_NS_Event_History 
where ResourceGuid = 'Add computer GUID here' 
and _eventTime between '2024-12-22 06:00:00.00' and '2024-12-23 11:00:00.00' 
order by _eventTime

Knowing now that what machines may be the biggest offenders and what type of NSEs are sending, you should be able to narrow down why those machines are sending that many NSEs (like if there are sending Basic Inventory more than once a day, collecting Inventory too frequently, etc).

If you noticed that there are multiple machines sending a large amount of NSEs and if you go to their local queue and there are too many NSEs still there (under ...\program files\Altiris\Altiris Agent\Queue), you can try to use the "FlushAgentEvents" core setting to instruct client machines to stop sending those NSEs and clear out their own queues. Clear queued events on endpoints in Symantec Management Platform 8.5")

2. Verify that there are not paused activities on the SMP server.
If you notice that there are multiple NSEs coming in but nothing seems to be processing, see if by chance the SMP services (Altiris Services service, Altiris File Reciever Service and Altiris Client Message Dispatcher service) are not stopped.
As well see if the following registry keys are not set to 1:

HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\eXpress\Notification Server\PauseActivities
HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\eXpress\Notification Server\PausedNSMessaging

3. Enable extra verbosity on the NS logs for NSE processing.

Open NS log viewer on the SMP server, under Options>Extended verbosities
 
It should reveal a huge amount of statistics in logs for analysis:

4. Verify that there is not an issue with possible poor SMP or SQL Server performance.
This is a more complicated step to validate since you will need to monitor the current state of your SQL server and depend on a DBA to do some troubleshooting. 
With recent versions of the SMP (8.1 and later), the NS logs should show you a quick snapshot of what your systems are doing. Look for "PerformanceSensor" source in the NS logs. It should look like this:

[SYSTEM]
 [app cpu: 0%, ram: 301.34 MB / 1%, uptime: 57.11:50:59.1137164]
 [ns cpu: 3%, ram: 4.70 GB / 24%, uptime: 55.18:31:50.3437500]
 [sql cpu: 4%, ram: 9.15 GB / 58.5% (Available physical memory is high), cpu history %: 23 / 3 / 3 / 3 / 3 / 17 / 4 / 3 / 3]
 [ns machine: SMP-MAIN (V), ram: 19.53 GB, cpu: 1x1995Mhz, versions: 8.5.5032.0, assembly: 8.5.5032.0]
 [sql machine: sql-main (V), ram: 15.62 GB, cpu: 1x1, affinity: 2 (AUTO), version: 13.0.5026.0 / Enterprise Edition (64-bit) / SP2, trip: 320]
 [pc physical: 0, virtual: 5, managed: 5, connectivity: 5, hierarchy: 0, ps: 1, ts: 2]
 [.NET 4.0.30319.42000]
-----------------------------------------------------------------------------------------------------
Date: 12/18/2020 11:16:37 AM, Tick Count: 672484485 (7.18:48:04.4850000), Host Name: SMP-MAIN, Size: 835 B
Process: AeXSvc (3064), Thread ID: 47, Module: AeXSVC.exe
Priority: 4, Source: PerformanceSensor

Vital information about CPU and memory usage on both of your SMP and SQL servers should be displayed. As well as Memory allocated, if there are virtual or physical servers, and other things.

Using the same "PerformanceSensor" source in the NS logs, you should be able to see queues information:

[Queues]
 [0: 0 / 0 B] => [0: 0 / 0 / 275 @ 16(0) t, 0.0 i/s, 04:18:00] [priority .. 19.07 MB]
 [1: 0 / 0 B] => [1: 0 / 0 / 4.85 k @ 16(0) t, 0.5 i/s, 02:42:10] [fast .. 244.14 KB]
 [2: 0 / 0 B] => [2: 0 / 0 / 24 @ 8(0) t, 0.0 i/s, 3.17:16:11] [default .. 4.77 MB]
 [3: 0 / 0 B] => [3: 0 / 0 / 0 @ 4(0) t, 0.0 i/s, 57.11:53:08] [slow .. 19.07 MB]
 [4: 0 / 0 B] => [4: 0 / 0 / 0 @ 2(0) t, 0.0 i/s, 57.11:53:08] [large, 19.07 MB +]
[Lifetime]
 [t=0, a=0, q=0, peak=0, done=5,146, speed=0.00, bps=0]
-----------------------------------------------------------------------------------------------------
Date: 12/18/2020 11:20:32 AM, Tick Count: 672719407 (7.18:51:59.4070000), Size: 817 B
Process: AeXSvc (3064), Thread ID: 46, Module: AeXSVC.exe
Priority: 4, Source: PerformanceSensor

This should help you to have an idea of how busy the queues are, which queue seems to be the busiest, if they are using the default or other values for the default queue processing values, etc. The example entry above shows a normal, no busy queues, using the default core settings values. 

NOTE: We have 5 queues (represented by the queueId column in EventQueueEntry table and the Id column in EventQueue table):
0 - priority queue, 1 - fast, 2 - normal, 3 - slow, 4 - large. 

MaxConcurrentPriorityMsgsThreadPoolSize is for the priority queue  
MaxConcurrentFastMsgsThreadPoolSize is for the fast queue  
MaxConcurrentDefaultMsgsThreadPoolSize is for the norm/default queue
MaxConcurrentSlowMsgsThreadPoolSize is for the slow queue
MaxConcurrentLargeMsgsThreadPoolSize is for the large queue

After having an understanding of the resources available and how busy the servers are:

a) you can determine if you may need to reboot or restart SQL service on your SQL Server
b) or if it is a good time to try Troubleshoot NSE Processing in 8.5+ as this provides guidance on truncating the eventqueue tables

NOTE:
Example of a bad queue processing configuration (from an ITMS 8.7.2 SMP Server having NSE processing issues):

[EventQueueDispatcher] [running, enabled]
 [76.91 k / 3.25 GB] => [300 / 462 / 57.08 k @ 500(300) t, 1.2 i/s, 07:00:24]
[Queues]
 [0: 21.25 k / 1021.10 MB] => [0: 100 / 156 @ 1(0,0) c / 1.91 k @ 100(100) t, 0.0 i/s, 00:00:25] [priority .. 20 MB]
 [1: 52.02 k / 862.45 MB] => [1: 100 / 150 @ 100(98,148) c / 52.42 k @ 100(100) t, 1.0 i/s] [fast .. 244.14 KB]
 [2: 3.64 k / 1.41 GB] => [2: 100 / 156 @ 71(50,0) c / 2.51 k @ 100(100) t, 0.2 i/s, 00:00:11] [default .. 4.77 MB]
 [3: 0 / 0 B] => [3: 0 / 0 @ 16(0,0) c / 245 @ 100(0) t, 0.0 i/s, 00:01:06] [slow .. 20 MB]
 [4: 0 / 0 B] => [4: 0 / 0 @ 1(0,0) c / 2 @ 100(0) t, 0.0 i/s, 02:08:40] [large, 20 MB +]
[Overall]
 [threads: 300 @ 300, queue: 300 (max: 301), done: # 57.08 k (3.21 GB), speed: 1.2 i/s (126.55 KBps)]
 [succeeded: # 57.08 k (3.21 GB), 1.2 i/s (126.55 KBps), 1.1 / 2.6 / 0.3 / 0.9]
 [failed: # 8 (1.47 MB), 0.0 i/s (1.54 KBps), 0.0 / 0.0 / 0.0 / 0.0]
-----------------------------------------------------------------------------------------------------
Date: 4/9/2025 5:47:30 AM, Tick Count: 25236859 (07:00:36.8590000), Size: 1.11 KB
Process: AeXSvc (6416), Thread ID: 261, Module: Altiris.NS.dll
Priority: 4, Source: PerformanceSensor

They are using 100 threads (look above under @100 in bold) for each event queue.
This is too many NSEs to be processed at the same time and brings problems, not performance improvements.
More threads - more deadlocks.

If you look at their [SYSTEM] log entry:

[SYSTEM]
 [ns cpu: 3%, ram: 9.25 GB / 14%, uptime: 6:47:20]
 [ns machine: SMPNS01 (V), ram: 64.00 GB, cpu: 32x2394Mhz, assembly: 8.7.3391.0, versions: 8.7.3391.0 (4/30/2024) / 8.7.1273.0 (5/4/2023) / 8.6.3268.0 (3/8/2022) / 8.6.1119.0 (2/18/2021) / 8.5.5713.0 (11/16/2020)]
 [ns os: Microsoft Windows Server 2016 Standard, 10.0.14393, en-US, TZ -420]
 [pc physical: 41179, virtual: 94, managed: 25085, policied in 24h: 17687, in cem: 9394, ps: 25, ts: 26]
 [licensing status: Expired: 3, Ok: 6]
 [fixes: 8.5 POST RU4, 8.5 POST RU4 ECV (v2), 8.5 POST RU4 ULM (v1), 8.5 POST_RU4 SMA_SMP (3), 8.6 POST_RU2 SMA_SMP (1), 8.6 POST_RU2 SMP_TS (1), 8.7 POST_RTM SMA_SMP (4), 8.7.2 POST SMA_SMP (9)]
-----------------------------------------------------------------------------------------------------
Date: 4/9/2025 5:47:30 AM, Tick Count: 25236796 (07:00:36.7960000), Size: 914 B
Process: AeXSvc (6416), Thread ID: 261, Module: Altiris.NS.dll
Priority: 4, Source: PerformanceSensor

This SMP server has a Total CPU count of 32 (see above under cpu: 32x2394Mhz entry), so a suggestion would be to set threading like this:

priority  queue : 4
fast queue : 4
default : 4
slow: 2
large: 1
--
Total: 15 threads, which is already about half of the system power (32 CPUs). By making this small change, it should allow the SMP Server enough room to catch up with current NSE processing. Then, you can set those values back to the default ones:

 

5. Check the index fragmentation on common EventQueue tables.
In some scenarios, especially in environments where there are constant Inventories being collected or heavy NSE traffic on a daily basis, EventQueue tables may need to be re-indexed.
Make sure you have a SQL Maintenance Plan for the Symantec _CMDB database is in place and it fits the needs of your environment.

Common KB articles suggested are:

SQL Server Implementation Best Practices and Performance Tuning
SQL Maintenance script for the Symantec Management Platform database
Maintenance of your CMDB - analyzing the defragmentation level of CMDB and performing the defragmentation

Some of the tables that you should watch over their index fragmentation are:

EventQueue
EventQueueEntryMetaData
EventQueueStatus 

Especially these two:

EventQueueEntry
EventQueueProcess

If you have slow NSE processing, you could try to use SQL Server "Rebuild All" and "Reorganize All" functionality on the indexes used by our Event Queue tables.

Note: In some situations, Index fragmentation can help just a little for a short period of time. Insignificant improvement in case of high volume of NSEs from clients when SMP processes them in large quantities. This is because NSEs are added and removed right away. However, that small improvement can help you to get a good number of NSEs to be processed and get you out of a bottleneck.

6. Review the current queue status.
Check if by chance there is a discrepancy on how many NSEs are in the database with what the actual EventQueue has. If you see that the EventQueue (under C:\ProgramData\Symantec\SMP\EventQueue\EvtQueue)  has for example 10,000 NSE files but in the database, it shows that there is more processing, usually indicates that something went out of sync. That maybe the SQL server is not processing incoming NSEs or it is hung.

You can use a query like this one to have an idea of how many NSEs are in the queue:

--How many NSEs are referenced on the database

select count (*) from EventQueueEntryMetadata

Another test is to see if by chance an NSE is stuck in the database for processing. Use the following query to see if that is the case. For example, if I run this query about once every minute:

select min(id) as Oldest, max(id) as Newest
from EventQueueEntry

and the "oldest" ID is not moving, then it is most likely that something is stuck. If that is the case, it is time to follow the recommendations from Troubleshoot NSE Processing in 8.5 and later where you will need to stop services and truncate tables so the NSEs in the queue can start processing again.

7. Check if there is a possible issue with Disk I/O
In most cases, you may need to use Perfmon on your SMP and/or SQL server and analyze how the disks are performing. Issues with the RAID used, disk speed, type of disk, etc could add slowness in how the NSEs are written on the physical queues and how that data is read.
As well, it is essential that common practices like disk defrag are in place.

Refer to Microsoft documentation on Perfmon and how to analyze Disk usage.

As well similar KBs like these ones:

Create a Performance Monitor counter set for Altiris support
Common Performance Monitor counter thresholds
Creating a Performance Monitor counter set for Notification Server

Note: Another area to check is storage drivers. Especially if "Page I/O Latch" is to high.


If the SQL Server is a VMWare virtual machine, check that VMTools are up-to-date

8. Lower the NSE Count that is allowed in the EventQueue folder on the SMP. 

Having many hundreds of thousands of NSEs in the EventQueue will slow down processing as the NS has to search through the Database Tables, and also the file. More than 50k is not recommended due to the slowness.

NOTE: MaxFileQSize (Default 20,000) has been deprecated and is no longer used to limit EventQueue. Use Core Setting - EvtQueueMaxCount instead.

9. Reviewing if Persistent Connections (websockets) are used

If Persistent Connections / Time Critical Management / Endpoint Management Workspaces / has been configured, please be advised that Persistent Connections uses a lot of CPU threads keeping connections opened on the SMP.  If you don't need Persistent Connections it's advised to turn them off.  If you want to use them, it's advised to make the following changes to the Core Settings in the Console (Settings > Notification Server > Core Settings). These items will show in the Console if you search the Active Settings for "msgsthreadpoolsize"

NOTE: It is recommended to make these changes to any system that is backing up, and is appropriate for an SMP with 32 CPUs. Keep Threads under 16 if SMP has 32 CPU.

Make the following changes:

  • MaxConcurrentPriorityMsgsThreadPoolSize  --> 4
  • MaxConcurrentFastMsgsThreadPoolSize      --> 4
  • MaxConcurrentDefaultMsgsThreadPoolSize   --> 4
  • MaxConcurrentLargeMsgsThreadPoolSize     --> 2
  • MaxConcurrentSlowMsgsThreadPoolSize      --> 2

10. Things that you should collect for troubleshoot this type of issues

Here are some ideas of things that should help Support and Engineering teams to have a better idea of what could be triggering a performance issue while processing NSEs in the queues:
While we are looking for sources of the NSE's I would rather make some extra steps for cases like this:

  • Copy of NSEs from C:\ProgramData\Symantec\SMP\EventQueues
  • Collect the evidences from the Altiris Administrator:
    • a) full NS Logs
    • b) profiling session (using Altiris Profiler) for some minutes when the issue is present
    • c) detailed description of hardware used to install SMP  + SQL
    • d) results of performance monitoring of the SQL server:  RAM usage, number of instances on the server and their load, HDD queue depth, IOPS performance of temp-db, etc.
    • e) list of tasks/policies and their schedules, that can be a source of the NSE flood
    • f) for each virtualization environment - detailed info about resource preallocation, hardware used, hardware status, etc.

Additional Information