Common problems for very large environments

book

Article ID: 180423

calendar_today

Updated On:

Products

Patch Management Solution for Windows Management Platform (Formerly known as Notification Server)

Issue/Introduction

 

Resolution

Question
What kinds of performance and usage problems have been encountered in large environments?

Answer

Recovery Solution
  • Default configuration of the RS database is configured to grow in 1MB increments.  Database can easily grow to 50+ GB.  All environments can safely change the growth rate to 10% of prior DB size.  Because the database file growth occurred in very tiny increments, the disk will be heavily fragmented.  Use traditional disk defragmentation tools to defrag the database (after temporarily stopping the SQL service). 

  • New RS implementations should strongly consider increasing the allocated database file size to 30GB.  This minimizes the file fragmentation issue, and avoids a performance hit that occurs each time the database file size is automatically increased.  Rule of thumb for RS database size is 2-5% of the space used to store the backed-up files.  Smaller environments will be closer to the 5% end of the range.  Large environments will be closer to 2%.
   
Patch Management
  • New PMimport.cab releases cause a large temporary spike of Inventory Rule retrieval and uploading of new scanning data. This load spike can overwhelm IIS to the point that the NS console is unavailable for 4–8 hours.

    Patch Management 6.2 supports the ability to move the Inventory Rule Web service to a separate application pool. This technique isolates the rest of the Notification Server from the load spike that was overwhelming standard agent and console communications. See article 25655 for implementation instructions.

  • Patch Inventory Rule scanning is too frequent. Avoid using intervals less than the default of 4 hours in production environments. 
 
 
Application Metering
  • Enabling monitoring of start and stop events for .exe files can overwhelm the server with event traffic. This is not recommended for any customer, but particularly painful for large environments. 

    The newest version of Application Metering includes some batch upload capabilities that may resolve this concern.

  • Disable the "All Applications" Monitor Policy as it will enable all the clients to send summary data for every .EXE.
   
Inventory Solution
  • Using the default of running all Inventory scanning on all computers at the same time each day or week will temporarily flood the NS queues until all NSEs have been processed. To alleviate, break-up inventory scanning into multiple collections which run on different days or utilize aexruncontrol.exe to randomize the scan times.

    For implementation details, see article 32175, "How to scale Inventory Solution in very large environments."
   
Asset Management
  • Client facing Notification Servers with 10,000 plus nodes do not respond quickly (on a consistent basis) due to the inevitable spikes in agent communication and data uploading. The real time interaction for Asset Management functions involves lots of data entry. To avoid console performance delays, implement a secondary Notification Server and forward the inventory to the dedicated Reporting/AMS server.
   
Notification Server
  • Collection update intervals are too frequent. The Notification Server and SQL will spend too much processing time rebuilding collections which could be better spent replying to agent requests, processing NSEs, and rendering the Notification Server console. 

    To avoid problems, stagger the delta and collection update schedules, and increase to 4+ hours.

  • Agent check-in intervals are too frequent. Agent configuration request processing is usually the highest source of load on the Notification Server. Agent policies (Tasks) aren't frequently modified due to change control procedures (in very large environments). Checking in too frequently results in the agents retrieving no new configuration data. The Notification Server must still review all enabled policies that apply to the agent.

    To avoid problems, increase the Altiris Agent check-in interval to a more reasonable setting such as 4–6 hours.

  • Report rendering hurts server performance. By default, the display row count is remembered for all future reports. Customers will frequently set the display row count to "All", which is fine for some smaller reports, but will cause 50,000 rows to returned for others.

    To avoid performance problems, update a SQL stored procedure (article 22542) This will reset the display row count before running each report.
  • Resource Data History tab: In large environments (10,000 plus nodes), the query behind the History tab within the Resource Manager can cause severe CPU/Memory and SQL utilization spikes. 

    To avoid the issue, implement a reporting Notification Server (forward the inventory to it), and avoid viewing the Resource History data on the client facing Notification Server.

  • Improving IIS and Notification Server response times by disabling debug mode: This is a common configuration that can (and should) be safely disabled on any Notification Server. High traffic environments with multiple Notification Server console users are the most heavily impacted. Follow the instructions as provided in article 33499.