Troubleshooting NSE processing issues
search cancel

Troubleshooting NSE processing issues

book

Article ID: 205352

calendar_today

Updated On:

Products

IT Management Suite Client Management Suite

Issue/Introduction

The following article is based on the following use cases:

  1. There are a large amount of NSEs accumulating in the EvtQueue folder
  2. The NSEs are not processing fast enough
  3. You want to know where these NSEs are coming from
  4. A small number of NSEs (fewer than ~50) sit in EvtQueue for hours or days without processing, while all other queue folders (EvtInbox, EvtQFast, EvtQLarge, EvtQPriority, EvtQSlow) appear empty and the server otherwise looks healthy. See Step 6a for the correct resolution.

Environment

ITMS 8.7.x, 8.8.x

Resolution

Overview

The Symantec Management Platform (SMP) uses Notification Server Events (NSEs)—small XML files—to communicate data from endpoints to the server. Slow or stalled NSE processing is a common cause of outdated inventory, delayed policy execution, and overall server performance degradation. 

ITMS 8.5 RU3 and 8.5 RU4 have added multiple improvements on NSE processing stability and performance. If you haven't upgraded to the most recent version of ITMS and NSE processing issues is a common problem, we recommend that you upgrade and take advantage of those improvements.

Also, in our ITMS 8.6 version, our Dev team made additional changes. Currently, to find candidates to process, "eventengine" uses stored procedures to pick the single oldest NSE per computer, then another oldest NSE for another computer, and so on. This is quite an expensive SQL query. In the new version, SMP will retrieve all NSEs for the computer and will perform process ordering at the application-level reducing impact on SQL.

As well, in our ITMS 8.8 version, our Dev team included further enhancements in order to have a better picture of what could be triggering all those incoming events. 

Background:

The most common reason for issues with NSE processing is that for some reason Client machines may be sending too many NSEs at once. Most cases could be related to very aggressive Inventory policies (sending delta or full inventory too frequently) or that many machines were not connected to the internal network for a while (because they not using Cloud-enabled Management (CEM) or some other network issue with agent connectivity causing the NSEs to accumulate in the local queue folder (under ...\program files\Altiris\Altiris Agent\Queue)) and as soon as these machines connect, they try to send everything that they were holding.

The NS logs should be your initial starting point on your troubleshooting efforts. 

Suggestions:

The following are suggestions on how you could troubleshoot most NSE processing issues.

The PerformanceSensor entries in the NS logs provide internal statistics that are essential for diagnosing the health of the NSE queues. Understanding the EventQueueDispatcher statistics is the first step in diagnosing server performance issues.

Finding the NSE Processing entries from PerformanceSensor

  1. Open the Altiris Log Viewer on the SMP Server (Start > Symantec > Altiris Log Viewer).
  2. Identify the PerformanceSensor entries. 
  3. Locate the [EventQueueDispatcher] information. This section is the most critical area for determining the health of NSE processing. Similar to this:

[EventQueueDispatcher] [running, enabled]
[612.03 k / 2.89 GB] => [32 / 96 / 6.41 m @ 46(0) t, 53.1 i/s, 2.18:03:03]
[Queues]
[0: 37.35 k / 1.45 GB, full] => [0: 16 / 48 @ 1(0,0) c / 600.95 k @ 16(0) t, 30.3 i/s, 14:02:09] [priority .. 19.07 MB]
[1: 574.68 k / 1.44 GB, full] => [1: 16 / 48 @ 16(0,34) c / 5.79 m @ 16(0) t, 22.2 i/s, 13:12:09] [fast .. 244.14 KB]
[2: 3 / 1.53 MB] => [2: 0 / 0 @ 4(0,0) c / 18.15 k @ 8(0) t, 0.5 i/s, 01:23:44] [default .. 4.77 MB]
[3: 0 / 0 B] => [3: 0 / 0 @ 1(0,0) c / 4 @ 4(0) t, 0.0 i/s, 1.11:16:13] [slow .. 19.07 MB]
[4: 0 / 0 B] => [4: 0 / 0 @ 0(0,0) c / 0 @ 2(0) t, 0.0 i/s, 2.18:03:03] [large, 19.07 MB +]
[Overall]
[threads: 32 @ 0, queue: 32 (max: 0), done: # 6.41 m (48.33 GB), speed: 0.0 i/s (0 Bps)]
succeeded: # 6.40 m (47.55 GB), 0.1 i/s (19.98 KBps), 0.0 / 0.0 / 0.3 / 0.0
failed: # 1.51 k (793.38 MB), 0.0 i/s (554.2 Bps), 0.0 / 0.0 / 0.0 / 0.0
-----------------------------------------------------------------------------------------------------
Date: 10/30 6:33:40 AM, Tick Count: 237845984 (2.18:04:05.9840000), Host Name: SMPServer, Size: 1.11 KB
Process: AeXSvc (10096), Thread ID: 45, Module: Altiris.NS.dll
Priority: 4, Source: PerformanceSensor


Interpreting EventQueueDispatcher Statistics

The [EventQueueDispatcher] section provides a snapshot of the messages waiting for processing and the engine's current speed.

    1. Review the Overall Dispatcher Line:
      • Example: [612.03 k / 2.89 GB] => [32 / 96 / 6.41 m @ 46(0) t, 53.1 i/s, 2.18:03:03]
      • The first part, [612.03 k / 2.89 GB], represents the Total Pending NSEs (count / size) waiting for the system to process.
    2. Review the [Queues] Sub-Section: This breaks down the pending NSEs by internal queue.
      • Example: [1: 135.71 k / 644.61 MB] => [1: 15 / 39 @ 16(14,38) c / 192.88 k @ 16(16) t, 43.2 i/s] [fast .. 244.14 KB]
    3. Identify High Queue Count: Focus on the first number in the queue sample: [1: **135.71 k** / 644.61 MB]
      • A count over 50,000 to 80,000 in any single queue is a strong indicator of a backlog. This forces the system to process messages from the same resource sequentially, leading to heavy database query load and overall slowdown.
      • Note: The number of items (count) is usually a more dramatic indicator of performance issues than the total size (MB).
    4. The following table decodes every field position in both the overall and per-queue lines for reference during troubleshooting:


      Field / Position

      Meaning

      What to Watch For

      612.03 k

      Total pending NSE count across all queues

      Above 50,000–80,000 in a single queue = backlog warning. Count causes more SQL pressure than size.

      2.89 GB

      Total pending NSE size

      Secondary indicator. Size alone rarely causes issues without a high count.

      32

      NSEs currently being dispatched (actively processing)

      Should be non-zero if queue is non-empty. Zero with full queue = stall.

      96

      NSEs loaded into memory ready for dispatch

       

      6.41 m

      Total NSEs processed since service start (cumulative)

      Useful for rate comparison across consecutive PerformanceSensor samples.

      46(0) t

      Active threads (threads currently idle)

      46 active, 0 idle. Active = 0 with queue full and speed = 0.0 i/s means the dispatcher is stalled.

      53.1 i/s

      Current processing speed (NSEs per second)

      0.0 i/s with a non-empty queue = stall condition requiring immediate action.

      2.18:03:03

      Service uptime (D:HH:MM:SS)

      Resets when AltirisClientMsgDispatcher is restarted.

      full

      Queue has reached its configured size cap (~1.45 GB for priority/fast queues)

      When full, PostEvent begins rejecting new incoming NSEs. SMA agents will retry later but may appear stale.

      16(0,34) c

      Processing chains (locked chains, pending count) for this queue

      Locked chains persisting for extended periods may indicate SQL contention or a stuck chain.



Log Entry Analysis:

      • [EventQueueDispatcher]: This is the main NSE processing queue. The logs clearly show the "priority" and "fast" queues are marked as "full", having reached their size limits of ~1.45 GB each.
        [0: 37.35 k / 1.45 GB, full] => [0: 16 / 48 @ 1(0,0) c / 600.95 k @ 16(0) t, 30.3 i/s, 14:02:09] [priority .. 19.07 MB]
        [1: 574.68 k / 1.44 GB, full] => [1: 16 / 48 @ 16(0,34) c / 5.79 m @ 16(0) t, 22.2 i/s, 13:12:09] [fast .. 244.14 KB]
      • This line "[612.03 k / 2.89 GB] => [32 / 96 / 6.41 m @ 46(0) t, 53.1 i/s, 2.18:03:03]" means:
        [pending count / pending size] => [processing now / loaded in memory / total done] @ [total active threads(current) / max threads], speed, uptime
      • This line "[1: 574.68 k / 1.44 GB, full] => [1: 16 / 48 @ 16(0,34) c / 5.79 m @ 16(0) t, 22.2 i/s, 13:12:09] [fast .. 244.14 KB]" means:
        [queue id: pending count / pending size, status] => [queue id: processing / pending in memory @ chains(locked, podcast) / total processed @ active slots(active threads), speed, last activity] [queue name .. max file size]
      • The high number of pending items (over 600k) and the "full" status are the key indicators. The system is unable to process events as fast as they are arriving.

 [NSMessageQueue] — What It Is and How to Read It 

Important: The PerformanceSensor log also emits [NSMessageQueue] entries alongside [EventQueueDispatcher]. This component is not related to NSE processing from client endpoints. It is an internal NS infrastructure queue that passes messages between NS processes and plugins using the NS API. Its presence in the logs can cause confusion during troubleshooting.

When investigating NSE issues, simply confirm this component shows [running, enabled] and over limit: False, then focus your attention on [EventQueueDispatcher].

Example [NSMessageQueue] entry:

[NSMessageQueue] [running, enabled, uptime: 2:18:03:22]

[queue: 0 (@0), added: 14.77 m @ 0.0 i/s (0.00 | 0.00 | 0.00 | 0.00), peak: 97 (6.41 k)]

[processing: 14.77 m @ 0.0 i/s (0.00 | 0.00 | 0.00 | 0.00)]

[data: 6.71 GB @ 0 Bps (0 Bps | 0 Bps | 0 Bps | 0 Bps)]

[raiser: 0, added: 127.11 k @ 4.4 i/s (3.54 | 2.93 | 5.78 | 5.40), peak: 31]

[settings: 100 k, wait: 200, over limit: False]

 

Field

Meaning

What to Look For

[running, enabled, uptime: D:HH:MM:SS]

Component status and how long it has been running

Confirm running, enabled. Uptime resets on AltirisClientMsgDispatcher restart.

queue: 0 (@0)

Current queue depth and active processing slots

Should be near 0. Brief spikes of dozens to low hundreds are expected and normal.

added: 14.77 m @ 0.0 i/s

Total messages added since uptime; current rate

High cumulative total (millions) is completely normal. Focus on current rate, not lifetime count.

peak: 97 (6.41 k)

Peak queue depth observed this window (97) and session all-time high (6,410)

Brief peaks under a few hundred are expected. Sustained high values warrant attention.

settings: 100 k, wait: 200

Max queue depth allowed (100,000); wait interval in ms

Compare queue depth to this cap for headroom.

over limit: False

Whether queue has exceeded its capacity limit

over limit: True is the ONLY value here requiring immediate action. False = healthy.

 

NOTE: High counts are normal

Cumulative totals in the millions and historical peaks in [NSMessageQueue] are completely normal and should not be treated as evidence of a problem. This engine processes internal NS API messages extremely fast and peak load is typically under 100 messages at any moment.

 

 

Verifying Incoming NSE Delivery

  1. Locate the [PostEvent] entries. This section reports statistics on the engine that receives NSEs from agents and delivers them to the EventQueueDispatcher.
    Here is an example of this type of entry:

    [PostEvent] [file system]
     succeeded: # 11.48 k (3.76 GB), 0.1 i/s (24.05 KBps), 0.0 / 0.0 / 0.4 / 0.0
     failed: # 1.48 k (308.13 MB), 18.3 i/s (3.85 MBps), 0.0 / 0.1 / 33.8 / 39.3
    -----------------------------------------------------------------------------------------------------
    Date: 10/30 6:33:40 AM, Tick Count: 237845984 (2.18:04:05.9840000), Host Name: SMPServer, Size: 410 B
    Process: AeXSvc (10096), Thread ID: 45, Module: Altiris.NS.dll
    Priority: 4, Source: PerformanceSensor

    [PostEvent] field notes 

    The label [file system] immediately after [PostEvent] indicates that NSEs are being delivered to the EventQueueDispatcher via the file system (EvtInbox folder) — this is the standard delivery mode.

    The four rate values in parentheses, e.g. 0.0 / 0.1 / 33.8 / 39.3, represent throughput across four progressive time windows (from shortest to longest average). This helps distinguish whether a rate spike is very recent or has been sustained.

    Incoming speed spike as an early warning: Monitor the succeeded bytes-per-second value across consecutive PerformanceSensor samples. A sudden sharp increase — especially coinciding with a scheduled inventory or task run — is an early indicator that the EventQueueDispatcher may be approaching full capacity. Immediately check the [EventQueueDispatcher] pending count when such a spike is observed.



  2. Check the "failed" statistics. The failure count indicates how many NSEs were not delivered.
    • A significant failure count here can be a sign that the EventQueueDispatcher is full or failing (e.g., due to the queue count exceeding an internal core setting like EvtQueueMaxCount), causing the server to reject new NSEs:
       failed: # 1.48 k (308.13 MB), 18.3 i/s (3.85 MBps), 0.0 / 0.1 / 33.8 / 39.3

When the server rejects NSEs, the Symantec Management Agent (SMA) will attempt to resend them later, but the agents may appear disconnected or stale in the console (The Notification Server rejects NSEs and the Sym Agents show as disconnected).

Reading PerformanceSensor as a Time Series 

NSE processing problems rarely appear in a single log snapshot. Comparing PerformanceSensor samples over time — at baseline, issue onset, and full degradation — reveals what changed and how the problem progressed:

 

Stage

[NSMessageQueue]

[EventQueueDispatcher]

[PostEvent]

Interpretation

Baseline (healthy)

rate: ~194 i/sover limit: False

pending: 0 / 0 Bthreads active, speed > 0

succeeded onlyfailed: 0

System healthy. Queues draining normally.

Issue onset

rate drops to ~97 i/sover limit: False

pending: 274k / 2 GBpriority queue: fullthreads still active

succeeded still climbingfailed: 0 yet

Queue filling fast. Identify and address source policy now.

Full degradation

rate: 0.0 i/sall windows zero

pending: 612k / 2.89 GBpriority + fast: fullspeed: 0.0 i/s0 active threads

failed: 1.48k at 18 i/sagents now rejected

Dispatcher stalled. Restart AltirisClientMsgDispatcher. Check NS logs for SQL transport errors.


Key rule: When [NSMessageQueue] rates drop to zero while [EventQueueDispatcher] is still filling, the bottleneck is in the NSE dispatcher, not in NS internal messaging. [PostEvent] failures appear after the dispatcher is already full — this is the expected sequence, not a separate additional problem.


After having a better picture of what the NS logs are saying, you can now move to troubleshooting the NSE queue issues.  

1. Understand what are those NSEs and where are coming from.

With this, try to identify what type of NSE those are: Basic Inventory, Hardware Inventory, login/logoff events, etc., as well if those are coming from certain machines.

There are two ways to identify these incoming NSEs:

Using "Event Data Analytics":


With the ITMS 8.8 Release, there is a new feature for System Health: Metadata Statistics (for Event Data Analytics). There are new reports that should help you to narrow down some patterns and what policies may need some adjustments.

Refer to "Using Event Data Analytics for understanding SMP Server performance


Using SSETools:

You can use SSETools "NSE diagnostics", which can help you to see the NSE type (displayed under Scenario Counts) and from what machines (under Resource counts). 

 

NOTE: SSETools limitation
SSE tools only analyze file events in EvtQueue. However, this is just a small fraction of all events. Smaller NSEs are kept directly in the database as inline messages.

Another tool that can be used is: Evaluating NSE data using SQL when a deeper analysis is needed.

NOTE: Capturing NSEs for analysis
In some situations, where too many NSEs are received and they are being processed faster than you can review them, you can capture them and save a copy of them in a different folder.  How to Capture processed NSEs on the Notification Server.
As well, you can capture "bad" NSEs that are been ignored. See: Enable collection of bad NSEs for review

[NseMeta] log entries for identifying top NSE sources — ITMS 8.8

Starting with ITMS 8.8, the NS log includes entries prefixed [NseMeta]. These are rebuilt approximately every hour and show the most active NSE types processed in that window — useful for quickly identifying which policy or task is generating the highest volume or failure count.


Example:

[NseMeta] 00:28:47, # 184

'Collect Full Inventory - GroupA' (acf24e2f-...): 14.43 k (1.99 GB), failed: 1, queues: {fast,default}, time taken: 1:38:42.16

'Custom Inventory - AppList' (d614d3f1-...): 11.48 k (582.38 MB), failed: 0, queues: {fast}, time taken: 0:31:45.60


Field

Meaning

00:28:47, # 184

Time since last flush; number of distinct NSE types tracked in this window

Policy name + GUID

The SMP policy or task generating these NSEs

14.43 k (1.99 GB)

Count and total size of NSEs processed for this policy in the window

failed: 1

NSEs for this policy that exhausted all retry attempts

queues: {fast,default}

Internal queues that handled this policy's NSEs. Spanning both means some NSEs exceeded the fast-queue size threshold (244.14 KB).

time taken: 1:38:42.16

Cumulative CPU time across all threads — NOT wall-clock time

 

EventFailureBackupFolder core setting — preserving failed NSEs 

When an NSE fails all retry attempts, it is deleted from EvtQueue and removed from the processing table. To preserve failed NSEs for inspection instead of discarding them:

Navigate to: Settings > Notification Server > Core Settings

Search for: EventFailureBackupFolder

Set the value to a valid local folder path (e.g., C:\NSE_Failures). Failed NSEs will be moved there rather than deleted, allowing you to inspect malformed XML, invalid characters, or missing data class references.

Note on inline vs. file NSEs: Smaller NSEs are stored directly in the database (inline) rather than as physical files in EvtQueue. For inline NSEs, the database entry is removed on final failure regardless of this setting. SSETools only sees file-based NSEs — always supplement with SQL queries for a complete picture.

 

 

If you prefer to use the information available in the database, you can use queries to show you want may be happening:

NOTE: ITMS 8.8 reports
ITMS 8.8 has reports: Pending Events

    • Reports >Notification Server Management > Server > Event Queue
      • Processed Events Summary
      • Processed Events Timeline

Average NSE count per computer:

DECLARE @compcount AS INT = (SELECT COUNT(*) FROM vComputer)

SELECT ItemName, COUNT(ResourceGuid) / @compcount

FROM Evt_NS_Event_History h

WHERE _eventTime >= GETDATE() - 1

GROUP BY ItemName

ORDER BY 2 DESC

 

Find machines with the most NSEs (above 500):

SELECT c.Name, [Source], COUNT(*) FROM EventQueueEntry e

JOIN vRM_Computer_Item c ON c.Guid = e.[Source]

GROUP BY c.Name, [Source]

HAVING COUNT(*) > 500

ORDER BY 3 DESC

 

Machines and NSE totals for a specific time period:

SELECT DISTINCT c.Guid, c.Name, COUNT(*) EventCount

FROM Evt_NS_Event_History h

JOIN vRM_Computer_Item c ON c.Guid = h.ResourceGuid

WHERE 1 = 1

AND h._eventTime BETWEEN '2024-12-22 06:00:00.00' AND '2024-12-23 11:00:00.00'

GROUP BY c.Guid, c.Name

ORDER BY 3 DESC

 

Drill into NSE types for a specific machine (use GUID from query above):

SELECT _eventTime, ItemGuid, ItemName, ResourceName

FROM Evt_NS_Event_History

WHERE ResourceGuid = 'Add computer GUID here'

AND _eventTime BETWEEN '2024-12-22 06:00:00.00' AND '2024-12-23 11:00:00.00'

ORDER BY _eventTime

Knowing now which machines may be the biggest offenders and what type of NSEs are being sent, you should be able to narrow down why those machines are sending that many NSEs (like if there are sending Basic Inventory more than once a day, collecting Inventory too frequently, etc).

If you noticed that there are multiple machines sending a large number of NSEs and if you go to their local queue and there are too many NSEs still there (under ...\program files\Altiris\Altiris Agent\Queue), you can try to use the "FlushAgentEvents" core setting to instruct client machines to stop sending those NSEs and clear out their own queues.  See KB: Clearing queued events on endpoints.

 

 

2. Verify PauseActivities is not Enabled on the SMP server.

If you notice that there are multiple NSEs coming in but nothing seems to be processing, see if by chance the SMP services (Altiris Services service, Altiris File Reciever Service and Altiris Client Message Dispatcher service) are not stopped.
As well see if the following registry keys are set to 1  (1 = activities are Paused, 0 = processing normally):

HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\eXpress\Notification Server\PauseActivities
HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\eXpress\Notification Server\PausedNSMessaging

3. Enable extra verbosity on the NS logs for NSE processing.

Open NS log viewer on the SMP server, under Options>Extended verbosities
 
It should reveal a huge amount of statistics in logs for analysis:

 

4. Verify that there is not an issue with possible poor SMP or SQL Server performance.

This is a more complicated step to validate since you will need to monitor the current state of your SQL server and depend on a DBA to do some troubleshooting. 
With recent versions of the SMP (8.1 and later), the NS logs should show you a quick snapshot of what your systems are doing. Look for "PerformanceSensor" source in the NS logs. It should look like this:

[SYSTEM]
 [app cpu: 0%, ram: 301.34 MB / 1%, uptime: 57.11:50:59.1137164]
 [ns cpu: 3%, ram: 4.70 GB / 24%, uptime: 55.18:31:50.3437500]
 [sql cpu: 4%, ram: 9.15 GB / 58.5% (Available physical memory is high), cpu history %: 23 / 3 / 3 / 3 / 3 / 17 / 4 / 3 / 3]
 [ns machine: SMP-MAIN (V), ram: 19.53 GB, cpu: 1x1995Mhz, versions: 8.5.5032.0, assembly: 8.5.5032.0]
 [sql machine: sql-main (V), ram: 15.62 GB, cpu: 1x1, affinity: 2 (AUTO), version: 13.0.5026.0 / Enterprise Edition (64-bit) / SP2, trip: 320]
 [pc physical: 0, virtual: 5, managed: 5, connectivity: 5, hierarchy: 0, ps: 1, ts: 2]
 [.NET 4.0.30319.42000]
-----------------------------------------------------------------------------------------------------
Date: 12/18 11:16:37 AM, Tick Count: 672484485 (7.18:48:04.4850000), Host Name: SMP-MAIN, Size: 835 B
Process: AeXSvc (3064), Thread ID: 47, Module: AeXSVC.exe
Priority: 4, Source: PerformanceSensor

Vital information about CPU and memory usage on both of your SMP and SQL servers should be displayed. As well as Memory allocated, if there are virtual or physical servers, and other things.

Using the same "PerformanceSensor" source in the NS logs, you should be able to see queues information:

[Queues]
 [0: 0 / 0 B] => [0: 0 / 0 / 275 @ 16(0) t, 0.0 i/s, 04:18:00] [priority .. 19.07 MB]
 [1: 0 / 0 B] => [1: 0 / 0 / 4.85 k @ 16(0) t, 0.5 i/s, 02:42:10] [fast .. 244.14 KB]
 [2: 0 / 0 B] => [2: 0 / 0 / 24 @ 8(0) t, 0.0 i/s, 3.17:16:11] [default .. 4.77 MB]
 [3: 0 / 0 B] => [3: 0 / 0 / 0 @ 4(0) t, 0.0 i/s, 57.11:53:08] [slow .. 19.07 MB]
 [4: 0 / 0 B] => [4: 0 / 0 / 0 @ 2(0) t, 0.0 i/s, 57.11:53:08] [large, 19.07 MB +]
[Lifetime]
 [t=0, a=0, q=0, peak=0, done=5,146, speed=0.00, bps=0]
-----------------------------------------------------------------------------------------------------
Date: 12/18 11:20:32 AM, Tick Count: 672719407 (7.18:51:59.4070000), Size: 817 B
Process: AeXSvc (3064), Thread ID: 46, Module: AeXSVC.exe
Priority: 4, Source: PerformanceSensor

This should help you to have an idea of how busy the queues are, which queue seems to be the busiest, if they are using the default or other values for the default queue processing values, etc. The example entry above shows a normal, no busy queues, using the default core settings values. 

NOTE: Queue ID reference
We have 5 queues (represented by the queueId column in EventQueueEntry table and the Id column in EventQueue table):

      • 0 - priority queue
      • 1 - fast
      • 2 - normal
      • 3 - slow
      • 4 - large

MaxConcurrentPriorityMsgsThreadPoolSize is for the priority queue  
MaxConcurrentFastMsgsThreadPoolSize is for the fast queue  
MaxConcurrentDefaultMsgsThreadPoolSize is for the norm/default queue
MaxConcurrentSlowMsgsThreadPoolSize is for the slow queue
MaxConcurrentLargeMsgsThreadPoolSize is for the large queue

After having an understanding of the resources available and how busy the servers are:

a) We may need to reboot or restart SQL services on your SQL Server
b) May need to try Troubleshoot NSE Processing in 8.x as this provides guidance on truncating the EventQueue tables


NOTE: 
Example of a bad queue processing configuration (from an ITMS 8.7.2 SMP Server having NSE processing issues):

[EventQueueDispatcher] [running, enabled]
 [76.91 k / 3.25 GB] => [300 / 462 / 57.08 k @ 500(300) t, 1.2 i/s, 07:00:24]
[Queues]
 [0: 21.25 k / 1021.10 MB] => [0: 100 / 156 @ 1(0,0) c / 1.91 k @ 100(100) t, 0.0 i/s, 00:00:25] [priority .. 20 MB]
 [1: 52.02 k / 862.45 MB] => [1: 100 / 150 @ 100(98,148) c / 52.42 k @ 100(100) t, 1.0 i/s] [fast .. 244.14 KB]
 [2: 3.64 k / 1.41 GB] => [2: 100 / 156 @ 71(50,0) c / 2.51 k @ 100(100) t, 0.2 i/s, 00:00:11] [default .. 4.77 MB]
 [3: 0 / 0 B] => [3: 0 / 0 @ 16(0,0) c / 245 @ 100(0) t, 0.0 i/s, 00:01:06] [slow .. 20 MB]
 [4: 0 / 0 B] => [4: 0 / 0 @ 1(0,0) c / 2 @ 100(0) t, 0.0 i/s, 02:08:40] [large, 20 MB +]
[Overall]
 [threads: 300 @ 300, queue: 300 (max: 301), done: # 57.08 k (3.21 GB), speed: 1.2 i/s (126.55 KBps)]
 [succeeded: # 57.08 k (3.21 GB), 1.2 i/s (126.55 KBps), 1.1 / 2.6 / 0.3 / 0.9]
 [failed: # 8 (1.47 MB), 0.0 i/s (1.54 KBps), 0.0 / 0.0 / 0.0 / 0.0]
-----------------------------------------------------------------------------------------------------
Date: 4/9/2025 5:47:30 AM, Tick Count: 25236859 (07:00:36.8590000), Size: 1.11 KB
Process: AeXSvc (6416), Thread ID: 261, Module: Altiris.NS.dll
Priority: 4, Source: PerformanceSensor

They are using 100 threads (look above under @100 in bold) for each event queue.
This is too many NSEs to be processed at the same time and brings problems, not performance improvements.
More threads - more deadlocks.

If you look at their [SYSTEM] log entry:

[SYSTEM]
 [ns cpu: 3%, ram: 9.25 GB / 14%, uptime: 6:47:20]
 [ns machine: SMPNS01 (V), ram: 64.00 GB, cpu: 32x2394Mhz, assembly: 8.7.3391.0, versions: 8.7.3391.0 (4/30/2024) / 8.7.1273.0 (5/4/2023) / 8.6.3268.0 (3/8/2022) / 8.6.1119.0 (2/18/2021) / 8.5.5713.0 (11/16/2020)]
 [ns os: Microsoft Windows Server 2016 Standard, 10.0.14393, en-US, TZ -420]
 [pc physical: 41179, virtual: 94, managed: 25085, policied in 24h: 17687, in cem: 9394, ps: 25, ts: 26]
 [licensing status: Expired: 3, Ok: 6]
 [fixes: 8.5 POST RU4, 8.5 POST RU4 ECV (v2), 8.5 POST RU4 ULM (v1), 8.5 POST_RU4 SMA_SMP (3), 8.6 POST_RU2 SMA_SMP (1), 8.6 POST_RU2 SMP_TS (1), 8.7 POST_RTM SMA_SMP (4), 8.7.2 POST SMA_SMP (9)]
-----------------------------------------------------------------------------------------------------
Date: 4/9/2025 5:47:30 AM, Tick Count: 25236796 (07:00:36.7960000), Size: 914 B
Process: AeXSvc (6416), Thread ID: 261, Module: Altiris.NS.dll
Priority: 4, Source: PerformanceSensor

This SMP server has a Total CPU count of 32 (see above under cpu: 32x2394Mhz entry), so a suggestion would be to set threading like this:

priority  queue : 4
fast queue : 4
default : 4
slow: 2
large: 2

Total: 16 threads, which is half of the system power (32 CPUs). 


SQL transport-level failures as a cause of NSE stalls 

An important but often overlooked cause of NSE processing stalls is intermittent network-level disconnections between the SMP server and SQL Server. These are not ordinary SQL timeout errors — they are transport-level drops that occur while the SMP holds SQL connections open across batched processing operations.

Even a very brief drop (seconds) can leave NSE processing chains marked as “in progress” in the database when in fact no thread is working on them. These stranded chains do not recover automatically while the service is running.

Log signatures — search for these in the NS logs:

Severity

Log Pattern

Meaning

Critical

A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

Network path to SQL Server dropped momentarily.

Critical

An existing connection was forcibly closed by the remote host.

SQL Server or a network device actively closed the TCP connection.

Critical

The semaphore timeout period has expired.

SQL connection attempt timed out at the network layer.

Warning

DatabaseContext finalizer called, which should not happen. This: [D: 1/65/0] {ConnOwner, Invalid, ReadCommitted, Closed}

A SQL context was garbage-collected in an invalid/dead state — indicates an earlier transport failure in the same process.

Error

The current database context is invalid due to a previous critical error. [InvalidDatabaseContextException @ Altiris.Database.dll]

Downstream effect of a transport failure — operations failing because the SQL context is already dead.

 

How to trace a DatabaseContext failure back to its origin

The DatabaseContext finalizer called message contains a thread ID that points to the original failure:

DatabaseContext finalizer called, which should not happen.

This: [D: 1/65/0] {ConnOwner, Invalid, ReadCommitted, Closed} id=3268, t='None', s=1/65 id=3267, AdminDatabaseContext

The second number in the s=N/XX field (here 65) is the original Thread ID of the code that caused the SQL context to be invalidated.

    1. Note the Thread ID from the s=N/XX field.
    2. Search the same NS log file (same process ID) for earlier errors or warnings from that thread ID.
    3. The earlier entry will show the original transport failure — this is the true root cause. The DatabaseContext finalizer message is a consequence, not the origin.

WARNING: Standard SQL ping tools may not detect this

SQL ping tools (including the SQL Test in SSETools) may show normal response times while the NS log fills with transport errors. These drops are extremely brief. Wireshark packet captures between the SMP and SQL servers during an active failure window provide the most reliable evidence. Also review Windows Event Logs and SQL Server error logs at the same timestamps.

 

5. Check the index fragmentation on common EventQueue tables

In some scenarios, especially in environments where there are constant Inventories being collected or heavy NSE traffic on a daily basis, EventQueue tables may need to be re-indexed.
Make sure you have a SQL Maintenance Plan for the Symantec _CMDB database is in place and it fits the needs of your environment.

Common KB articles suggested are:

SQL Server Implementation Best Practices and Performance Tuning
SQL Maintenance script for the Symantec Management Platform database
Maintenance of your CMDB - analyzing the defragmentation level of CMDB and performing the defragmentation

Some of the tables that you should watch over their index fragmentation are:

EventQueue
EventQueueEntryMetaData
EventQueueStatus 

Especially these two:

EventQueueEntry
EventQueueProcess

If you have slow NSE processing, you could try to use SQL Server "Rebuild All" and "Reorganize All" functionality on the indexes used by our Event Queue tables.

Note: Note on index fragmentation impact
In some situations, Index fragmentation can help just a little for a short period of time. Insignificant improvement in case of high volume of NSEs from clients when SMP processes them in large quantities. This is because NSEs are added and removed right away. However, that small improvement can help you to get a good number of NSEs to be processed and get you out of a bottleneck.

6. Review the current queue status

Check if by chance there is a discrepancy on how many NSEs are in the database with what the actual EventQueue has. If you see that the EventQueue (under C:\ProgramData\Symantec\SMP\EventQueue\EvtQueue)  has for example 10,000 NSE files but in the database, it shows that there is more processing, usually indicates that something went out of sync. That maybe the SQL server is not processing incoming NSEs or it is hung.

You can use a query like this one to have an idea of how many NSEs are in the queue:

--How many NSEs are referenced on the database

select count (*) from EventQueueEntryMetadata

Another test is to see if by chance an NSE is stuck in the database for processing. Use the following query to see if that is the case. For example, if I run this query about once every minute:

select min(id) as Oldest, max(id) as Newest
from EventQueueEntry

If the "oldest" ID is not moving, then it is most likely something is stuck. If that is the case, it is time to follow the recommendations from Troubleshoot NSE Processing in 8.x where you will need to stop services and truncate tables so the NSEs in the queue can start processing again.

Step 6a — Resolving a small number of stuck NSEs without a large backlog 

The KB Troubleshoot NSE Processing in 8.x truncation procedure above is designed for large-scale backlogs of thousands of NSEs caused by major events such as SQL crashes or catastrophically misconfigured policies. A separate, distinct scenario exists where only a small number of NSEs (fewer than ~50) sit in EvtQueue for hours or days while all other queue folders appear empty and the server looks otherwise healthy.

The root cause is a SQL transport-level failure (see Step 4) that left internal processing chains marked as “in progress” in the database — when in fact no thread is working on them. The AltirisClientMsgDispatcher service contains a built-in consistency-check routine that resolves these stranded chains, but this check only runs at service startup.


CRITICAL: Do NOT manually move NSE files from EvtQueue to EvtInbox

NSEs must be processed in chronological order per resource. Moving files manually from EvtQueue to EvtInbox bypasses this ordering and can cause data inconsistencies in the Symantec_CMDB database.

The KB 172741 procedure (move files to temp folder → truncate SQL tables → copy back to EvtInbox) is only appropriate for catastrophic large-scale backlogs — not for a handful of organically stuck events.


Correct resolution for a small number of stuck NSEs:

  1. Confirm the NSEs are stuck: run the MIN/MAX EventQueueEntry query twice, ~2 minutes apart. If Oldest does not advance, the chain is stalled.
  2. Check NS logs for SQL transport-level error signatures (see Step 4 additions).
  3. Diagnose which specific entries are stranded in the processing table using the diagnostic query below. The key column is CreatedDate (last column). If any row shows a CreatedDate many hours or days before the current time, that NSE chain is stuck in EventQueueProcess while no code is working on it:

SELECT p.*, m.CreatedDate

FROM EventQueueProcess p

JOIN EventQueueEntryMetaData m ON m.Id = p.Id

ORDER BY m.CreatedDate ASC

A CreatedDate value that is hours or days old confirms a stuck chain. The NSE was accepted into processing but the SQL failure that occurred during that operation left it permanently locked in EventQueueProcess with no active thread to complete it.

    1. If stuck entries are confirmed, choose one of these two options:

        • Option A (preferred — safe): Restart the AltirisClientMsgDispatcher service. This triggers the built-in consistency check and moves stranded entries back to the pre-processing table automatically.

      Navigate to: Settings > Notification Server > Internals > Core Performance → Click [Restart] on the Client Message Dispatcher row.

        • Option B (SQL script — use only if a service restart is not immediately possible): The script below identifies entries in EventQueueProcess older than a configurable threshold and moves them back to EventQueueEntry, making them eligible for re-dispatch. This does not delete any NSE data.

       

      WARNING: Read before running this script

      The default age threshold is 6 hours (@nseMaxAgeHours = 6). Adjust this value to match your environment before running. Run the diagnostic query above first to confirm what will be affected. Each fix is wrapped in a transaction — a failure on any entry rolls back only that entry and the loop exits.

      After running the script, the fixed NSEs will be picked up by the AltirisClientMsgDispatcher on its next dispatch cycle. No service restart is required for Option B, but a restart is still recommended afterwards to run the full consistency check.

      -- Define the age of NSE to suspect as stuck

      DECLARE @nseMaxAgeHours INT = 6

      DECLARE @maxDate DATETIME = DATEADD(HOUR, -@nseMaxAgeHours, GETDATE())

      DECLARE @fixedMessages INT = 0

      DECLARE @agedId BIGINT

      SET NOCOUNT ON

       

      -- Loop until no more aged messages are found

      WHILE 1=1

      BEGIN

          SET @agedId = NULL

          SELECT TOP 1 @agedId = me.Id

          FROM EventQueueEntryMetaData me

          JOIN EventQueueProcess pro ON pro.Id = me.Id

          WHERE me.CreatedDate <= @maxDate

       

          IF @agedId IS NULL

          BEGIN

              PRINT ('There is no more aged NSE with processing state')

              BREAK

          END

          ELSE

          BEGIN

              PRINT ('Found aged NSE #' + CAST(@agedId AS VARCHAR(10)) + ', fixing')

              BEGIN TRY

                  BEGIN TRAN

                  -- Move processing messages back to staging table

                  DELETE      eqp

                      OUTPUT  DELETED.Id, DELETED.QueueId, DELETED.Priority, DELETED.Source

                      INTO    EventQueueEntry

                      FROM    EventQueueProcess eqp

                      WHERE   eqp.Id = @agedId

                  COMMIT TRAN

                  SET @fixedMessages = @fixedMessages + 1

              END TRY

              BEGIN CATCH

                  IF XACT_STATE() <> 0 ROLLBACK TRAN

                  DECLARE @ErrMsg      NVARCHAR(4000) = ERROR_MESSAGE()

                         ,@ErrSeverity INT            = ERROR_SEVERITY()

                         ,@ErrState    INT            = ERROR_STATE()

                  RAISERROR(@ErrMsg, @ErrSeverity, @ErrState)

                  BREAK

              END CATCH

          END

      END

       

      IF @fixedMessages > 0

      BEGIN

          PRINT ('Fixed ' + CAST(@fixedMessages AS VARCHAR(10)) + ' entries, updating queue statistics.')

          DECLARE @dummy TABLE (x INT, y INT, z BIGINT, t DATETIME)

          INSERT INTO @dummy EXEC spGetQueueStats 1

      END

      ELSE

      BEGIN

          PRINT ('No aged NSEs were spotted')

      END

      SET NOCOUNT OFF




    2. After completing Option A or Option B, monitor [EventQueueDispatcher] entries in the NS log. Confirm active thread counts increase and the EvtQueue file count decreases.
    3. If SQL transport errors are frequent or recurring, engage the network/DBA team to investigate the root cause. Schedule a periodic restart of AltirisClientMsgDispatcher during off-peak hours as a recurring mitigation.

 

7. Check if there is a possible issue with Disk I/O

In most cases, you may need to use Perfmon on your SMP and/or SQL server and analyze how the disks are performing. Issues with the RAID used, disk speed, type of disk, etc could add slowness in how the NSEs are written on the physical queues and how that data is read.
As well, it is essential that common practices like disk defrag are in place.

Refer to Microsoft documentation on Perfmon and how to analyze Disk usage.

As well similar KBs like these ones:

Create a Performance Monitor counter set for Altiris support
Common Performance Monitor counter thresholds
Creating a Performance Monitor counter set for Notification Server

NOTE: Storage drivers and VMware
Another area to check is storage drivers. Especially if "Page I/O Latch" is to high.
 

If the SQL Server is a VMWare virtual machine, check that VMTools are up-to-date

8. Lower the NSE Count that is allowed in the EventQueue folder on the SMP

Having many hundreds of thousands of NSEs in the EventQueue will slow down processing as the NS has to search through the Database Tables, and also the file. More than 50k is not recommended due to the slowness.

NOTE: MaxFileQSize deprecated
MaxFileQSize (Default 20,000) has been deprecated and is no longer used to limit the size of the Event Queue. Use Core Setting - EvtQueueMaxCount instead.
Important: The value for EvtQueueMaxCount must be entered as a plain integer. Do not use “k” notation. For example, to set a limit of 50,000 enter 50000, not “50k”.

9. Reviewing if Persistent Connections (websockets) are used

If Persistent Connections / Time Critical Management / Endpoint Management Workspaces / has been configured, please be advised that Persistent Connections uses a lot of CPU threads keeping connections opened on the SMP.  If you don't need Persistent Connections it's advised to turn them off.  If you want to use them, it's advised to make the following changes to the Core Settings in the Console (Settings > Notification Server > Core Settings). These items will show in the Console if you search the Active Settings for "msgsthreadpoolsize"

Thread Pool Sizing Rule

The combined total of all MaxConcurrent*MsgsThreadPoolSize values should not exceed 50% of the SMP server's available CPU core count for standard environments. For environments with heavy IIS load (many active agent connections), active Persistent Connections (websockets), or ongoing hierarchy replication, consider limiting total threads to one-third (33%) of available CPU cores. More threads cause more SQL deadlocks and do not improve throughput. If the SMP server's CPU count has been reduced (e.g., VM rightsizing), reduce thread pool settings proportionally — they do not auto-adjust.

CPU Count Changes

If the SMP server's CPU count has been reduced (e.g., due to VM rightsizing), review and reduce thread pool settings proportionally. These settings do not auto-adjust when CPU resources are changed.

SMP CPU Cores

Standard Max Total Threads (50%)

Heavy IIS/Replication (33%)

Suggested Distribution (Priority / Fast / Default / Slow / Large)

8

4

3

1 / 1 / 1 / 1 / 0

16

8

5

2 / 2 / 2 / 1 / 1

32

16

10

4 / 4 / 4 / 2 / 2

64

32

20

8 / 8 / 8 / 4 / 4

Example: It is recommended to make these changes to any system that is backing up, and is appropriate for an SMP with 32 CPUs. Keep Threads under 16 if SMP has 32 CPU.

Make the following changes:

    • MaxConcurrentPriorityMsgsThreadPoolSize  → 4
    • MaxConcurrentFastMsgsThreadPoolSize      → 4
    • MaxConcurrentDefaultMsgsThreadPoolSize   → 4
    • MaxConcurrentLargeMsgsThreadPoolSize     → 2
    • MaxConcurrentSlowMsgsThreadPoolSize      → 2

To configure thread pool settings:

    1. Navigate to: Settings > Notification Server > Core Settings
    2. In the search box, type: msgsthreadpoolsize
    3. Locate and adjust the five settings: MaxConcurrentPriorityMsgsThreadPoolSize, MaxConcurrentFastMsgsThreadPoolSize, MaxConcurrentDefaultMsgsThreadPoolSize, MaxConcurrentSlowMsgsThreadPoolSize, MaxConcurrentLargeMsgsThreadPoolSize
    4. Restart the AltirisClientMsgDispatcher service to apply changes.

Bad Configuration Example

The following is an example of a severely over-threaded configuration observed in a production environment (ITMS 8.7.2, 32-CPU SMP server):

[EventQueueDispatcher] [running, enabled]

[76.91 k / 3.25 GB] => [300 / 462 / 57.08 k @ 500(300) t, 1.2 i/s, 07:00:24]

[Queues]

[0: 21.25 k / 1021.10 MB] => [0: 100 / 156 @ 1(0,0) c / 1.91 k @ 100(100) t, 0.0 i/s, 00:00:25] [priority .. 20 MB]

[1: 52.02 k / 862.45 MB] => [1: 100 / 150 @ 100(98,148) c / 52.42 k @ 100(100) t, 1.0 i/s] [fast .. 244.14 KB]

[2: 3.64 k / 1.41 GB] => [2: 100 / 156 @ 71(50,0) c / 2.51 k @ 100(100) t, 0.2 i/s, 00:00:11] [default .. 4.77 MB]

This configuration uses 100 threads per queue (500 total) on a 32-CPU server. The recommended maximum for this server is 16 total threads. The result is extreme SQL deadlock contention and a processing speed of only 1.2 i/s — despite 300 threads being active.

Correct settings for this server (32 CPUs): Priority: 4 / Fast: 4 / Default: 4 / Slow: 2 / Large: 2 = 16 total.

EvtQueueMaxCount — Limiting Queue Depth

The core setting EvtQueueMaxCount limits the total number of NSEs allowed in the EventQueueDispatcher at one time. When this limit is reached, the queue is marked 'full' and new NSEs are rejected until space is available (the SMA agent will retry sending them).

EvtQueueMaxCount Value Format

The value must be entered as a plain integer. Do not use 'k' notation. For example, to set a limit of 50,000, enter 50000, not '50k'.

Recommended starting values:

    • 50000 — for environments experiencing heavy SQL pressure and wanting to limit queue depth aggressively.
    • 100000 — for larger environments (e.g., 50,000+ managed computers) with adequate SQL resources.

Note: MaxFileQSize (previously used to limit queue size) has been deprecated and is no longer effective. Use EvtQueueMaxCount instead.

Apply the SmpTopContextMode Core Setting

This setting was introduced in SMA_SMP_8_8_PF_v10 (see CUMULATIVE POST ITMS 8.8 RTM(GA) POINT FIXES (KB 400510)). It controls how the SMP manages SQL database connections when intermittent SQL connectivity drops are detected.

Value

Description

When to Use

0

Conservative — SQL connections opened/closed per operation

Use if instability continues after trying value 2.

1

Optimized (default) — connections held open for batched operations

Normal operation with stable SQL connectivity.

2

Balanced — first mitigation step when SQL transport errors are observed

Start here when transport-level SQL errors appear in logs.

To configure:

    1. Navigate to: Settings > Notification Server > Core Settings
    2. Search for: SmpTopContextMode
    3. If the setting does not appear, confirm that SMA_SMP_8_8_PF_v10 or later is installed.
    4. Set the value to 2.
    5. Monitor NS logs — if transport errors decrease and NSE processing stabilises, no further change is needed.
    6. If instability continues, change to 0 and re-engage the network/DBA team for root cause investigation.

10. Items that you should collect for troubleshoot this type of issues

Here are some ideas of things that should help Support and Engineering to have a better idea of what could be triggering a performance issue:

    • Copy of NSEs from C:\ProgramData\Symantec\SMP\EventQueues
    • There is a newer feature in the Console Core Performance.  Settings > Notification Server > Internals > Core Performance
      • This can be used to keep track of NSE Processing, resource usage, etc. 
    • Collect the evidence as the Altiris Administrator:
      • a) full NS Logs
      • b) profiling session (using Altiris Profiler) for some minutes when the issue is present
      • c) detailed description of hardware used to install SMP + SQL
      • d) results of performance monitoring of the SQL server:  RAM usage, number of instances on the server and their load, HDD queue depth, IOPS performance of temp-db, etc.
      • e) list of tasks/policies and their schedules, that can be a source of the NSE flood
      • f) for each virtualization environment - detailed info about resource preallocation, hardware used, hardware status, etc.
      • g) SQL Server error logs from the same time window as the NS log errors
      • h) Network trace (Wireshark capture between SMP and SQL servers) if SQL transport-level drops are suspected and standard logs cannot identify the cause
      • i) Core Performance page screenshots before and after any service restarts (Settings > Notification Server > Internals > Core Performance)

Additional Information

Evaluating NSE data using SQL when a deeper analysis is needed

How to check for excessive NSE files in the EventQueue and be alerted when too high

SMA_SMP Cumulative Point Fix (includes SmpTopContextMode core setting)

Agents appearing disconnected or stale due to rejected NSEs