search cancel

Troubleshooting NSE Processing Issues in ITMS 8.x

book

Article ID: 205352

calendar_today

Updated On:

Products

IT Management Suite Client Management Suite

Issue/Introduction

This article addresses the following scenarios:

Use Case 1: A large number of NSEs are accumulating in the EvtQueue folder and not draining.
Use Case 2: NSEs are not processing fast enough — inventory, policy status, and task results are delayed by hours or days.
Use Case 3: You need to identify which machines or policies are generating the most NSE traffic.
Use Case 4: A small number of NSEs (fewer than 50) sit in EvtQueue indefinitely without processing while all other queue folders appear empty and the server otherwise looks healthy.

Environment

IT Management Suite (ITMS) / Client Management Suite (CMS)
ITMS 8.5 RU3 through 8.8 (with applicable Point Fixes)
AeXSvc.exe — Notification Server (NS) main process
AltirisClientMsgDispatcher (controls NSE queue processing)
Altiris File Receiver Service (w3wp.exe / IIS)

Cause

The SMP Server processes Notification Server Event (NSE) files to receive inventory, policy compliance, and status data from managed endpoints. NSE processing issues fall into two broad categories:

Volume-based backlogs: Too many NSEs arrive faster than the server can process them. Root causes include overly aggressive inventory policies, large numbers of agents reconnecting after a network outage, or under-resourced SQL/SMP servers.
Stall-based blockages: A small number of NSEs stop processing due to an internal state inconsistency — most commonly caused by intermittent SQL connectivity failures that leave processing chains stranded in the database.

Both types produce similar symptoms (NSEs sitting in EvtQueue), but require different responses. This article guides you through diagnosing which category applies before prescribing a resolution.

ℹ️ Related Articles

For a high-level entry-point overview of NSE backlog scenarios: NSE Backlog: Troubleshooting Delayed Reporting and Event Processing (KB 421492)

Probable Cause Table

Use the following table to identify the most likely root cause before proceeding to troubleshooting steps:

Probable Cause	Likelihood	Key Evidence	Notes
SQL transport-level network failure	High	A transport-level error has occurred in NS log; DatabaseContext finalizer called	Restart AltirisClientMsgDispatcher to clear stalled chains; engage network/DBA team
Overly aggressive inventory policy	High	High NSE count from specific machines (SSETools or SQL query); [NseMeta] log shows large policy counts	Stagger schedules; reduce frequency or scope
Thread pool over-configuration	Medium	@100 threads per queue in PerformanceSensor; high SQL deadlock count	Reduce total threads to ≤ 50% of CPU count
SQL index fragmentation	Medium	Slow query execution; MIN(id) in EventQueueEntry not advancing	Rebuild/reorganize indexes on EventQueue* tables
Malformed / broken NSEs	Medium	Failed to load inventory + invalid character errors in NS log	Identify source script/policy; use EventFailureBackupFolder to capture for inspection
PauseActivities registry key set	Low	Services running but zero processing; registry key = 1	Set registry key to 0
Disk I/O bottleneck	Low	High disk queue in Perfmon; slow file writes to EvtQueue folder	Check RAID health, VM storage reservations, VMTools version
Missing data class for custom inventory	Low	NSE dispatch failed with specific GUID; GUID not found in database	Disable or update the orphaned custom inventory script

Resolution

Verification — Confirming NSE Processing Has Recovered

Overview

The Symantec Management Platform (SMP) uses Notification Server Events (NSEs) — compressed XML files — to communicate data from endpoints to the server. Slow or stalled NSE processing is a common cause of outdated inventory, delayed policy execution, and overall server performance degradation.

ITMS 8.5 RU3 and 8.5 RU4 introduced multiple stability and performance improvements for NSE processing. ITMS 8.6 further optimised the SQL query strategy for finding processing candidates (from per-query single-pick to application-level ordering). ITMS 8.8 added Event Data Analytics reports and the [NseMeta] log instrumentation for deeper visibility into incoming event patterns.

If you are on an older version and NSE processing issues are recurring, upgrading to the most recent supported release is strongly recommended.

⚠️ Common Background Cause

The most frequent cause of NSE processing issues is client machines sending too many NSEs simultaneously. This may be due to:

Overly aggressive inventory policies collecting delta or full inventory too frequently.
Many machines that were offline reconnecting and flushing accumulated NSEs at once.
Agents without Cloud-enabled Management (CEM) building up local queues under ...\Program Files\Altiris\Altiris Agent\Queue.

Step 1 — Diagnose the Backlog Using Console Reports (ITMS 8.8)

The most effective first step in ITMS 8.8 is to use the built-in Event Queue reports:

Navigate to: Reports > Notification Server Management > Server > Event Queue > Metadata Statistics
Run the Pending Events report. If the Total Pending NSE Count is consistently above 50,000–80,000, or if the queue is not clearing during off-peak hours, you have a confirmed backlog.
Run the Processed Events Summary report to identify which source, policy, or product is generating the highest NSE volume over the past 24 hours.
Run the Processed Events Timeline report to identify spikes correlating with scheduled policies or tasks.

For more information on these reports, see: Using Event Data Analytics for understanding SMP Server performance (KB 398266)

Step 2 — Read the NS Logs: PerformanceSensor Entries

The NS logs are the primary diagnostic tool for NSE processing issues. Open the Altiris Log Viewer on the SMP Server (Start > Symantec > Altiris Log Viewer) and filter by source: PerformanceSensor.

Three log prefixes are relevant. Understanding what each one covers prevents misdiagnosis:

2A — [EventQueueDispatcher] — Primary NSE Processing Queue

This is the most important entry for NSE troubleshooting. It reports the real-time state of the NSE processing engine.

Format — Overall Dispatcher Line:

[EventQueueDispatcher] [running, enabled]

[612.03 k / 2.89 GB] => [32 / 96 / 6.41 m @ 46(0) t, 53.1 i/s, 2.18:03:03]

Field	Meaning	What to Look For
[running, enabled]	Component status	Must show running, enabled. If disabled, check PauseActivities registry keys.
612.03 k	Pending NSE count waiting for processing	⚠️ Above 50,000–80,000 in a single queue is a backlog warning. Above 100k is critical.
2.89 GB	Total pending NSE size	Size is secondary to count — count causes more SQL pressure than size.
32	NSEs currently being dispatched (in active processing)	Should be non-zero if queue is non-empty.
96	NSEs loaded into memory and queued for dispatch
6.41 m	Total NSEs processed since service start	Cumulative counter — useful for rate comparison across samples.
46(0) t	Active threads (threads currently idle)	46 active, 0 idle. If active = 0 while queue is full, the dispatcher is stalled.
53.1 i/s	Current processing speed (NSEs per second)	0.0 i/s with a non-empty queue = stall condition.
2.18:03:03	Service uptime (D:HH:MM:SS)	Resets when AltirisClientMsgDispatcher is restarted.

Format — Per-Queue Line:

[1: 574.68 k / 1.44 GB, full] => [1: 16 / 48 @ 16(0,34) c / 5.79 m @ 16(0) t, 22.2 i/s, 13:12:09] [fast .. 244.14 KB]

Field	Meaning	What to Look For
1:	Queue ID (0=priority, 1=fast, 2=default, 3=slow, 4=large)	See queue ID table below.
574.68 k / 1.44 GB	Pending count / pending size for this queue	High count = heavy SQL ordering pressure for sequential processing.
full	Queue has reached its size limit	Priority and fast queues each have a ~1.45 GB size cap. 'full' means new NSEs for this queue are rejected.
16 / 48	Currently processing / loaded in memory
16(0,34) c	Chains (locked chains, podcast count)	Locked chains > 0 may indicate processing contention.
5.79 m	Total processed by this queue	Cumulative.
16(0) t	Active slots (active threads)	0 active threads on a non-empty queue = stall.
22.2 i/s	Queue processing speed
13:12:09	Time since last activity on this queue	Long duration with no activity on a non-empty queue = stall indicator.
fast .. 244.14 KB	Queue name and the max NSE file size routed to this queue	NSEs larger than this threshold go to the next queue.

Queue ID Reference

Queue ID	Queue Name	NSE Size Threshold	Core Setting (Max Threads)	Default Threads
0	Priority	Up to 19.07 MB (small, high-priority)	MaxConcurrentPriorityMsgsThreadPoolSize	16
1	Fast	Up to 244.14 KB	MaxConcurrentFastMsgsThreadPoolSize	16
2	Default / Normal	Up to 4.77 MB	MaxConcurrentDefaultMsgsThreadPoolSize	8
3	Slow	Up to 19.07 MB (large)	MaxConcurrentSlowMsgsThreadPoolSize	4
4	Large	Above 19.07 MB	MaxConcurrentLargeMsgsThreadPoolSize	2

2B — [PostEvent] — NSE Delivery Statistics

This entry reports statistics for the engine that receives NSEs from agents and delivers them into the EventQueueDispatcher. The label [file system] indicates that NSEs are being delivered via the EvtInbox folder mechanism — this is the standard delivery mode.

[PostEvent] [file system]

succeeded: # 11.48 k (3.76 GB), 0.1 i/s (24.05 KBps), 0.0 / 0.0 / 0.4 / 0.0

failed: # 1.48 k (308.13 MB), 18.3 i/s (3.85 MBps), 0.0 / 0.1 / 33.8 / 39.3

Field	Meaning	What to Look For
succeeded: # 11.48 k (3.76 GB)	Total NSEs successfully delivered and total size	Normal — agents are sending and delivery is working.
0.1 i/s (24.05 KBps)	Current delivery rate	Watch for sudden spikes (e.g., 3+ MB/s) which are early warnings of impending queue saturation.
0.0 / 0.0 / 0.4 / 0.0	Rate breakdown across 4 time windows (short to long)	Helps detect whether a rate change is recent or sustained.
failed: # 1.48 k (308.13 MB)	NSEs that could not be delivered — agents will retry	A rising failed count means the EventQueueDispatcher is full or experiencing failures. SMA agents will retry later but may appear stale.
18.3 i/s (3.85 MBps)	Current failure rate	Any non-zero failure rate warrants checking EventQueueDispatcher queue status.

⚠️ Incoming Spike as an Early Warning

Monitor the succeeded bytes-per-second value across consecutive PerformanceSensor samples. A sudden significant increase in the incoming rate — especially coinciding with a scheduled inventory or task policy run — is an early warning that the EventQueueDispatcher may be approaching full capacity. When a spike is observed, immediately check the [EventQueueDispatcher] pending count.

2C — [NSMessageQueue] — Internal NS Infrastructure Queue

This entry is not related to NSE processing from client endpoints. NSMessageQueue is an internal NS infrastructure component that passes messages between NS processes and plugins using the NS API. Its statistics appear alongside NSE-related entries in PerformanceSensor logs, which can cause confusion.

[NSMessageQueue] [running, enabled, uptime: 2:18:03:22]

[queue: 0 (@0), added: 14.77 m @ 0.0 i/s (0.00 | 0.00 | 0.00 | 0.00), peak: 97 (6.41 k)]

[processing: 14.77 m @ 0.0 i/s (0.00 | 0.00 | 0.00 | 0.00)]

[data: 6.71 GB @ 0 Bps (0 Bps | 0 Bps | 0 Bps | 0 Bps)]

[raiser: 0, added: 127.11 k @ 4.4 i/s (3.54 | 2.93 | 5.78 | 5.40), peak: 31]

[settings: 100 k, wait: 200, over limit: False]

Field	Meaning	What to Look For
[running, enabled, uptime: ...]	Component status and uptime	Confirm running, enabled. Uptime resets on service restart.
queue: 0 (@0)	Current queue depth and active slots	Should be near 0. Brief spikes are expected and normal.
added: 14.77 m @ 0.0 i/s	Total messages added; current rate	High cumulative total is normal. Focus on current rate, not lifetime count.
peak: 97 (6.41 k)	Peak queue depth observed (97) and session high (6,410)	Brief peaks under a few hundred are expected. Sustained highs warrant attention.
settings: 100 k, wait: 200	Max queue depth allowed (100,000); wait interval in ms	Compare queue depth to this limit for headroom.
over limit: False	Whether queue has exceeded capacity	over limit: True is the ONLY value here requiring immediate action. False = healthy.

✅ Key Point

High total counts (millions) and historical peaks in NSMessageQueue are completely normal and should not be treated as evidence of a problem. When investigating NSE processing issues, you can validate this component is healthy simply by confirming it shows [running, enabled] and over limit: False, then move on to [EventQueueDispatcher].

2D — Reading PerformanceSensor as a Time Series

NSE processing issues rarely appear in a single log snapshot. Comparing multiple samples over time reveals the progression. Collect PerformanceSensor samples at baseline, at issue onset, and during full degradation, then compare:

Stage	NSMessageQueue	EventQueueDispatcher	PostEvent	Interpretation
Baseline (healthy)	rate: ~194 i/sover limit: False	pending: 0 / 0 Bthreads active, speed > 0 i/s	succeeded onlyfailed: 0	Queues draining normally. No action needed.
Issue onset	rate drops: ~97 i/sover limit: False	pending: 274k / 2 GBpriority queue fullthreads still active	succeeded still risingfailed: 0 yet	Queue filling fast. Dispatcher struggling. Incoming rate still high. Act now — investigate source policy.
Full degradation	rate: 0.0 i/sall rates zero	pending: 612k / 2.89 GBpriority + fast both fullspeed: 0.0 i/s0 active threads	failed: 1.48k at 18 i/sagents now rejected	Dispatcher stalled. SMA agents cannot deliver NSEs. Restart AltirisClientMsgDispatcher. Investigate SQL errors.

Key diagnostic rule: When [NSMessageQueue] processing rates drop to zero while [EventQueueDispatcher] is still filling, the bottleneck is in the NSE dispatcher — not in NS internal messaging. [PostEvent] failures appear after the dispatcher is already full — this is the expected sequence.

Step 3 — Identify What NSEs Are and Where They Come From

Before taking any remediation action, identify the NSE type and source so the correct corrective action is applied.

Using [NseMeta] Log Entries (ITMS 8.8)

Starting with ITMS 8.8, the NS log includes entries prefixed with [NseMeta]. These are rebuilt approximately every hour and show the most active NSE types processed during that window.

[NseMeta] 00:28:47, # 184

'Collect Full Inventory - GroupA' (acf24e2f-...): 14.43 k (1.99 GB), failed: 1, queues: {fast,default}, time taken: 1:38:42.16

'Custom Inventory - AppList' (d614d3f1-...): 11.48 k (582.38 MB), failed: 1, queues: {fast,default}, time taken: 0:31:45.60

Field	Meaning
00:28:47, # 184	Time since last flush; number of distinct NSE types tracked in this window
Policy name + GUID	The SMP policy or task generating these NSEs
14.43 k (1.99 GB)	Count and total size of NSEs processed for this policy in the window
failed: 1	Number of NSEs for this policy that exhausted all retry attempts
queues: {fast,default}	Internal queues that handled this policy's NSEs. Spanning both means some NSEs exceeded the fast-queue size threshold.
time taken: 1:38:42.16	Cumulative CPU time across all threads — NOT wall-clock time

Use [NseMeta] to: Identify which policy generates the highest NSE volume or failure rate; detect unexpectedly large or failing custom inventory scripts; and validate that NSE counts reduce after policy adjustments.

Using SSETools — NSE Diagnostics

SSETools provides a visual breakdown of NSE types (Scenario Counts) and source machines (Resource Counts) for file-based NSEs in the EvtQueue folder.

Download: SSETools (KB 150132)

⚠️ SSETools Limitation

SSETools only analyzes file-based NSEs stored in the EvtQueue folder. Smaller NSEs are stored as inline entries directly in the database (EventQueueEntry table) and are not visible to SSETools. Always supplement SSETools analysis with the SQL queries below for a complete picture.

Using SQL Queries for NSE Analysis

Use the following queries against the Symantec_CMDB database:

Average NSE count per computer:

DECLARE @compcount AS INT = (SELECT COUNT(*) FROM vComputer)

SELECT ItemName, COUNT(ResourceGuid) / @compcount

FROM Evt_NS_Event_History h

WHERE _eventTime >= GETDATE() - 1

GROUP BY ItemName

ORDER BY 2 DESC

Find machines with the most NSEs (above 500):

SELECT c.Name, [Source], COUNT(*)

FROM EventQueueEntry e

JOIN vRM_Computer_Item c ON c.Guid = e.[Source]

GROUP BY c.Name, [Source]

HAVING COUNT(*) > 500

ORDER BY 3 DESC

Machines and NSE counts for a specific time period:

SELECT DISTINCT c.Guid, c.Name, COUNT(*) EventCount

FROM Evt_NS_Event_History h

JOIN vRM_Computer_Item c ON c.Guid = h.ResourceGuid

WHERE h._eventTime BETWEEN '2024-12-22 06:00:00.00' AND '2024-12-23 11:00:00.00'

GROUP BY c.Guid, c.Name

ORDER BY 3 DESC

Drill into NSE types for a specific machine (use GUID from query above):

SELECT _eventTime, ItemGuid, ItemName, ResourceName

FROM Evt_NS_Event_History

WHERE ResourceGuid = '<computer GUID here>'

AND _eventTime BETWEEN '2024-12-22 06:00:00.00' AND '2024-12-23 11:00:00.00'

ORDER BY _eventTime

Source-to-Action Decision Table

Once you have identified the source, use this table to select the appropriate response:

If the Source Is...	Action to Take
Scheduled Basic Inventory	Stagger the schedule to prevent all agents reporting simultaneously. Navigate to: Settings > Agents/Plug-ins > Targeted Agent Settings > Symantec Management Agent Settings. Modify the Basic Inventory schedule to use randomized start times.
A specific Solution Policy (e.g., Software Inventory)	Navigate to the policy (e.g., Manage > Policies > Software). Review its schedule and Applied To targets. Reduce frequency or narrow the target population.
Agents flushing queued NSEs after network reconnect	Use FlushAgentEvents core setting to instruct client machines to clear their local agent queues. In the SMP Console, go to Settings > Notification Server > Core Settings > Filter and search for "FlushAgentEvents". See: KB Clearing queued events on endpoints in SMP 8.x (KB 175204)
Custom inventory script referencing deleted data class	Identify the GUID in the 'NSE dispatch failed' log error. Query the database to confirm the data class is gone. Disable or update the custom inventory script.
No clear source identified	Temporarily disable all non-essential inventory policies for 2 hours. If the backlog begins clearing, re-enable policies one by one to isolate the cause.

NSE Type Lifecycle and Inline vs. File NSEs

NSE types — inline vs. file-based:

Inline NSE: Small NSEs stored directly as rows in the EventQueueEntry database table. No physical file in EvtQueue. Not visible to SSETools.
File NSE: Larger NSEs stored as physical .nse files in C:\ProgramData\Symantec\SMP\EventQueue\EvtQueue and referenced by the database.

What happens when an NSE fails to process:

The dispatcher retries the NSE a configured number of times.
If all retries fail, the NSE file is deleted from EvtQueue and its database entry is removed.
If the EventFailureBackupFolder core setting is configured, the failed NSE is moved to that folder for post-failure inspection instead of being deleted.
For inline NSEs, the database entry is simply removed on final failure.

ℹ️ EventFailureBackupFolder Core Setting

To preserve failed NSEs for analysis:

Navigate to: Settings > Notification Server > Core Settings

Search for: EventFailureBackupFolder

Set the value to a valid local folder path (e.g., C:\NSE_Failures). Failed NSEs will be moved here instead of deleted, allowing inspection of malformed XML or invalid data class references.

Step 4 — Verify PauseActivities Is Not Enabled

If NSEs are arriving but nothing is processing, verify that the SMP services are running and activities are not paused:

Confirm these three services are running: Altiris Service, Altiris File Receiver Service, Altiris Client Message Dispatcher Service.
Check the following registry keys. A value of 1 means processing is paused:
HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\eXpress\Notification Server\PauseActivities
HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\eXpress\Notification Server\PausedNSMessaging
If either key is set to 1, set it to 0 and restart the Altiris Service.

Step 5 — Verify SMP and SQL Server Performance

Review the [SYSTEM] PerformanceSensor entry for CPU, memory, and SQL health:

[SYSTEM]

[app cpu: 0%, ram: 301.34 MB / 1%, uptime: 57.11:50:59]

[ns cpu: 3%, ram: 4.70 GB / 24%, uptime: 55.18:31:50]

[sql cpu: 4%, ram: 9.15 GB / 58.5% (Available physical memory is high), cpu history %: 23/3/3/3/3/17/4/3/3]

[ns machine: SMP-MAIN (V), ram: 19.53 GB, cpu: 1x1995Mhz, versions: 8.5.5032.0]

[sql machine: sql-main (V), ram: 15.62 GB, cpu: 1x1, affinity: 2 (AUTO), version: 13.0.5026.0 / Enterprise Edition (64-bit)]

Identifying SQL Transport-Level Failures

A critical but often overlooked root cause is intermittent network-level disconnections between the SMP server and SQL Server. These are not standard SQL timeout errors — they are transport-level drops that occur while the SMP holds SQL connections open for batched processing operations.

Even a very brief drop (seconds) can cause a disproportionate impact: the SMP fails to complete partially-written database operations, leaving NSE processing chains marked as 'in progress' in the database — when in fact no thread is working on them. These chains are not automatically recovered while the service is running.

SQL transport-level error signatures — search for these in NS logs:

Severity	Log Pattern	Meaning
Critical	A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)	Network path to SQL Server dropped momentarily.
Critical	An existing connection was forcibly closed by the remote host.	SQL Server or a network device actively closed the TCP connection.
Critical	The semaphore timeout period has expired.	SQL connection attempt timed out at the network layer.
Warning	DatabaseContext finalizer called, which should not happen. This: [D: 1/65/0] {ConnOwner, Invalid, ReadCommitted, Closed}	A SQL context object was garbage-collected in an invalid/dead state — indicates an earlier transport failure in the same process.
Error	The current database context is invalid due to a previous critical error. [InvalidDatabaseContextException @ Altiris.Database.dll]	Downstream effect of a transport failure — operations failing because the SQL context is already dead.
Error	NSE dispatch failed for: ... Failed to load inventory. [AeXException @ Altiris.NS.dll] ... A transport-level error has occurred...	NSE dispatch aborted mid-processing due to SQL transport failure. This NSE may now be stuck.

How to Trace a DatabaseContext Failure Back to Its Source

The DatabaseContext finalizer called message contains a clue to find the original failure:

DatabaseContext finalizer called, which should not happen.

This: [D: 1/65/0] {ConnOwner, Invalid, ReadCommitted, Closed} id=3268, t='None', s=1/65 id=3267, AdminDatabaseContext

The second number in the s=N/XX field (here 65) is the original Thread ID of the code that caused the SQL context to be invalidated.

Note the Thread ID from the s=N/XX field.
Search the NS log (same file, same process ID) for earlier errors or warnings from that exact thread ID.
The earlier entry will show the original transport failure (e.g., 'specified network name is no longer available').
This is the true root cause. The DatabaseContext finalizer is a consequence, not the origin.

SQL Connectivity Investigation Steps

Search NS logs (filter source: AdminDatabaseContext or EventQueueDispatcher) for the transport error patterns listed above.
Identify the timestamp when errors first appeared. Cross-reference with scheduled policy runs or task executions.
Engage the network/DBA team with exact timestamps to review: SQL Server error logs, Windows Event Log on the SQL Server machine, and network device logs (routers, load balancers) between SMP and SQL Server.
As a short-term mitigation, configure the SmpTopContextMode core setting (see Step 9).
As a recurring mitigation, schedule a periodic restart of AltirisClientMsgDispatcher during off-peak hours (e.g., a nightly maintenance window) to clear any accumulated stalled chains.

⚠️ Standard SQL Ping Tools May Not Detect This

SQL ping tools (including the SQL Test tool in SSETools) may show normal response times while the SMP logs fill with transport errors. The drops are extremely brief — often fractions of a second. Wireshark packet captures between the SMP and SQL servers during an active failure window provide the most reliable evidence. Also review Windows Event Logs and SQL Server logs at the same timestamps when the errors occur in the NS logs.

Step 6 — Enable Extended Verbosity for NSE Logging

For deeper diagnostics, enable extended verbosity in the Altiris Log Viewer:

Open Altiris Log Viewer on the SMP Server.
Navigate to: Options > Extended Verbosities
Enable NSE-related verbosity options.
Reproduce the issue or wait for PerformanceSensor intervals to collect detailed statistics.

Extended verbosity reveals much more detailed PerformanceSensor output and per-chain dispatching information, which is valuable when the standard log is insufficient to diagnose the root cause.

Step 7 — Review the Current Queue Status and Detect Stalled Chains

Use the following query to check for a discrepancy between the file count in EvtQueue and the database. If the database shows processing activity but the EvtQueue folder count is not decreasing, something may be out of sync:

Count of NSEs referenced in the database:

SELECT COUNT(*) FROM EventQueueEntryMetadata

Check if the oldest NSE in the queue is advancing:

SELECT MIN(id) AS Oldest, MAX(id) AS Newest

FROM EventQueueEntry

Run this query twice, approximately 2–3 minutes apart. If the Oldest value does not change between runs, the oldest NSE chain is stalled and not being processed.

Resolving a Small Number of Stuck NSEs (Use Case 4)

This scenario is distinct from a volume-based backlog. A small number of NSEs (fewer than 50) sit in EvtQueue indefinitely, all other queue folders are empty, and the server appears otherwise healthy. The root cause is a SQL transport failure that left internal processing chains marked as 'in progress' in the database, even though no thread is actually working on them.

The AltirisClientMsgDispatcher service contains a built-in consistency-check routine that detects and resolves these stranded chains. This check only runs at service startup — it cannot safely run while NSEs are actively being processed.

🚫 Do NOT Manually Move NSE Files

Do NOT move NSE files manually from the EvtQueue folder to EvtInbox to force reprocessing of a small number of stuck events.

NSEs must be processed in chronological order per resource. Moving them manually bypasses this ordering and can produce data inconsistencies in the Symantec_CMDB database.

The manual move procedure described in KB Troubleshoot NSE Processing in 8.5 and later (KB 172741) is designed only for large-scale NSE backlogs (thousands of files) caused by major events such as SQL crashes or catastrophically misconfigured policies — not for a handful of organically stuck NSEs.

Correct procedure for resolving stuck NSEs:

Confirm the NSEs are stuck by running the MIN/MAX EventQueueEntry query twice, 2–3 minutes apart. Confirm Oldest is not advancing.
Check NS logs for SQL transport-level error signatures (see Step 5).
If transport errors are found, note the timestamp and thread ID for escalation.
Restart the AltirisClientMsgDispatcher service. This triggers the built-in consistency check and resolves stranded processing chains.

Navigate to: Settings > Notification Server > Internals > Core Performance → Click [Restart] on the Client Message Dispatcher row.

After restarting, monitor NS logs for [EventQueueDispatcher] entries. Confirm active thread counts increase and the EvtQueue NSE count decreases.
If SQL transport errors are frequent or recurring, configure SmpTopContextMode = 2 (see Step 9) and engage the network/DBA team.

Step 8 — Check SQL Index Fragmentation on EventQueue Tables

In environments with constant inventory collection or heavy NSE traffic, EventQueue table indexes can become fragmented, adding measurable overhead to NSE processing queries.

Ensure a SQL Maintenance Plan is in place for the Symantec_CMDB database. Related references:

Primary tables to monitor for index fragmentation:

EventQueueEntry (most critical)
EventQueueProcess (most critical)
EventQueue
EventQueueEntryMetaData
EventQueueStatus

If you have slow NSE processing, you could try to use SQL Server "Rebuild All" and "Reorganize All" functionality on the indexes used by these Event Queue tables.

ℹ️ Note on Index Fragmentation Impact

Index fragmentation can help reduce processing bottlenecks, but the improvement is often modest and temporary in high-volume environments. NSEs are written and removed continuously, so indexes refragment quickly. Rebuilding indexes is valuable to improve performance at the margin — it does not address the underlying cause of a large backlog.

Step 9 — Review and Tune Thread Pool Settings

Incorrectly configured thread pool settings are a common cause of NSE processing problems — too many threads cause SQL deadlocks and degrade throughput. More threads do not mean faster processing.

Thread Pool Sizing Rule

The combined total of all MaxConcurrent*MsgsThreadPoolSize values should not exceed 50% of the SMP server's available CPU core count for standard environments. For environments with heavy IIS load (many active agent connections), active Persistent Connections (websockets), or ongoing hierarchy replication, consider limiting total threads to one-third (33%) of available CPU cores.

⚠️ CPU Count Changes

If the SMP server's CPU count has been reduced (e.g., due to VM rightsizing), review and reduce thread pool settings proportionally. These settings do not auto-adjust when CPU resources are changed.

SMP CPU Cores	Standard Max Total Threads (50%)	Heavy IIS/Replication (33%)	Suggested Distribution (Priority / Fast / Default / Slow / Large)
8	4	3	1 / 1 / 1 / 1 / 0
16	8	5	2 / 2 / 2 / 1 / 1
32	16	10	4 / 4 / 4 / 2 / 2
64	32	20	8 / 8 / 8 / 4 / 4

To configure thread pool settings:

Navigate to: Settings > Notification Server > Core Settings
In the search box, type: msgsthreadpoolsize
Locate and adjust the five settings: MaxConcurrentPriorityMsgsThreadPoolSize, MaxConcurrentFastMsgsThreadPoolSize, MaxConcurrentDefaultMsgsThreadPoolSize, MaxConcurrentSlowMsgsThreadPoolSize, MaxConcurrentLargeMsgsThreadPoolSize
Restart the AltirisClientMsgDispatcher service to apply changes.

Bad Configuration Example

The following is an example of a severely over-threaded configuration observed in a production environment (ITMS 8.7.2, 32-CPU SMP server):

[EventQueueDispatcher] [running, enabled]

[76.91 k / 3.25 GB] => [300 / 462 / 57.08 k @ 500(300) t, 1.2 i/s, 07:00:24]

[Queues]

[0: 21.25 k / 1021.10 MB] => [0: 100 / 156 @ 1(0,0) c / 1.91 k @ 100(100) t, 0.0 i/s, 00:00:25] [priority .. 20 MB]

[1: 52.02 k / 862.45 MB] => [1: 100 / 150 @ 100(98,148) c / 52.42 k @ 100(100) t, 1.0 i/s] [fast .. 244.14 KB]

[2: 3.64 k / 1.41 GB] => [2: 100 / 156 @ 71(50,0) c / 2.51 k @ 100(100) t, 0.2 i/s, 00:00:11] [default .. 4.77 MB]

This configuration uses 100 threads per queue (500 total) on a 32-CPU server. The recommended maximum for this server is 16 total threads. The result is extreme SQL deadlock contention and a processing speed of only 1.2 i/s — despite 300 threads being active.

Correct settings for this server (32 CPUs): Priority: 4 / Fast: 4 / Default: 4 / Slow: 2 / Large: 2 = 16 total.

EvtQueueMaxCount — Limiting Queue Depth

The core setting EvtQueueMaxCount limits the total number of NSEs allowed in the EventQueueDispatcher at one time. When this limit is reached, the queue is marked 'full' and new NSEs are rejected until space is available (the SMA agent will retry sending them).

⚠️ EvtQueueMaxCount Value Format

The value must be entered as a plain integer. Do not use 'k' notation. For example, to set a limit of 50,000, enter 50000, not '50k'.

Recommended starting values:

50000 — for environments experiencing heavy SQL pressure and wanting to limit queue depth aggressively.
100000 — for larger environments (e.g., 50,000+ managed computers) with adequate SQL resources.

Note: MaxFileQSize (previously used to limit queue size) has been deprecated and is no longer effective. Use EvtQueueMaxCount instead.

Step 10 — Apply the SmpTopContextMode Core Setting

This setting was introduced in SMA_SMP_8_8_PF_v10 (see CUMULATIVE POST ITMS 8.8 RTM(GA) POINT FIXES (KB 400510)). It controls how the SMP manages SQL database connections when intermittent SQL connectivity drops are detected.

Value	Description	When to Use
0	Conservative — SQL connections opened/closed per operation	Use if instability continues after trying value 2.
1	Optimized (default) — connections held open for batched operations	Normal operation with stable SQL connectivity.
2	Balanced — first mitigation step when SQL transport errors are observed	Start here when transport-level SQL errors appear in logs.

To configure:

Navigate to: Settings > Notification Server > Core Settings
Search for: SmpTopContextMode
If the setting does not appear, confirm that SMA_SMP_8_8_PF_v10 or later is installed.
Set the value to 2.
Monitor NS logs — if transport errors decrease and NSE processing stabilises, no further change is needed.
If instability continues, change to 0 and re-engage the network/DBA team for root cause investigation.

Step 11 — Review Persistent Connections (Websockets) Configuration

If Persistent Connections / Time Critical Management / Endpoint Management Workspaces have been configured, be aware that persistent connections hold open CPU threads on the SMP for the duration of each active connection. This reduces the CPU available for NSE processing.

If Persistent Connections are not required, it is advisable to disable them. If they are required, apply the conservative thread pool sizing (one-third of CPU count) described in Step 9.

Step 12 — Check Disk I/O Performance

Disk I/O issues affect both the physical EvtQueue file writes and the SQL database operations that underpin NSE processing.

Use Perfmon on the SMP and SQL servers to monitor disk queue depth and IOPS.
Confirm that RAID health is nominal and that disks are not running at capacity.
For virtual environments, verify that CPU, RAM, and storage reservations are configured (not just limits — reservations ensure resources are guaranteed during load spikes).
If SQL Server is a VMware virtual machine, confirm VMware Tools are up-to-date. Outdated VMware Tools can cause 'Page I/O Latch' contention in SQL.

Step 13 — Large-Scale NSE Backlog Recovery (Thousands of Stuck NSEs)

If the EventQueueDispatcher has accumulated thousands of NSEs that are not draining — typically caused by a SQL crash, service outage, or catastrophic policy misconfiguration — follow the procedure in:

Troubleshoot NSE Processing in 8.x (KB 172741)

That procedure involves stopping services, moving NSE files from EvtQueue to a temporary folder, truncating SQL event tables, restarting services, monitoring recovery, enabling FlushAgents if necessary, and then copying NSEs back in small batches to EvtInbox.

🚫 This Procedure Is for Large-Scale Backlogs Only

Do NOT apply the Troubleshoot NSE Processing in 8.x (KB 172741) truncation procedure to a small number of stuck NSEs (use case 4). For small counts, restart the AltirisClientMsgDispatcher service as described in Step 7. The truncation procedure is a last resort for catastrophic backlog situations.

Step 14 — Items to Collect for Escalation to Engineering

If the issue cannot be resolved with the steps above, collect the following before escalating:

Full NS logs covering the entire period when the issue is (or was) present — not just a few hours. Engineering needs to see day-to-day trends and patterns.
Altiris Profiler session — run for 3–5 minutes while the issue is actively present. Navigate to: Settings > Notification Server > Internals > Altiris Profiler.
NSE samples — copy of files from C:\ProgramData\Symantec\SMP\EventQueues (if file-based NSEs are stuck).
System specifications — detailed hardware for both SMP and SQL servers (physical or virtual, CPU count, RAM, storage type, VM reservation settings).
SQL performance data — RAM usage, number of SQL instances and their load, HDD queue depth, IOPS on TempDB.
Policy and task schedule list — identify which policies/tasks run on a schedule that could produce large NSE volumes.
SQL Server error logs from the same time window as the NS log errors.
Network trace (Wireshark) — if SQL transport-level errors are present and the network team cannot identify the cause from standard logs.
Core Performance page screenshots — before and after any service restarts. Navigate to: Settings > Notification Server > Internals > Core Performance.

Verification — Confirming NSE Processing Has Recovered

After applying any resolution steps, confirm the issue is resolved using the following checks:

Verification Check	How to Verify	Expected Healthy Result
PerformanceSensor — Active Threads	Locate the latest [EventQueueDispatcher] entry in the NS log	Active thread count > 0 for queues with pending NSEs. No queue marked 'full'. Speed > 0 i/s.
EventQueueEntry — Oldest ID advancing	Run MIN/MAX query twice, 3 minutes apart	Oldest value changes between runs, confirming dispatcher is progressing.
EvtQueue folder — File count decreasing	Check file count in C:\ProgramData\Symantec\SMP\EventQueue\EvtQueue	File count trending downward over time.
NS log — No new SQL transport errors	Search NS log for 'transport-level error' after applying mitigations	No new instances appearing in logs after the fix.
Core Performance page — Thread counts	Navigate to: Settings > Notification Server > Internals > Core Performance	Active Threads shows non-zero values for queues with pending items. Queue count trending down.
Console Reports — Pending Events	Reports > Notification Server Management > Server > Event Queue > Pending Events	Pending NSE count is below 50,000 and decreasing.

Additional Information

Evaluating NSE data using SQL when a deeper analysis is needed

How to check for excessive NSE files in the EventQueue and be alerted when too high

Feedback

thumb_up Yes

thumb_down No