Optimizing DX UIM Operator Console - Performance troubleshooting guide

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

Users may experience slow performance in the DX UIM Operator Console (OC), particularly during 'Loading...' times and when displaying metrics. This delay can manifest in various OC webapps, including PRD reports, List Views, and Metrics Viewer displays.
This article provides a checklist to help users identify and address the factors affecting OC's response times. It's essential to work closely with your Database Administrator (DBA), especially concerning the database server, its maintenance, and administration.
The overarching question this article aims to address is: How can we optimize OC response times?
- Slow response times may be seen in either one or more OC webapps such as PRD reports, List Views, or Metrics Viewer displays.
- When loading is slow, this information can help customers and support collect and analyze more complete data to be used in optimizing OC performance and/or eliminating OC slowness or performance issues.
- Moreover, customers should work closely with their DBA regarding most of the factors listed below regarding the database server, DB maintenance, and administration.
- How can we analyze/optimize OC response times?

Environment

DX UIM 20.4 or higher
Operator Console (OC)

Cause

Guidance

Resolution

Analyzing operator Console slowness

Try to identify/pinpoint where the Operator Console slowness exists, e.g., 'Loading...' data phase? or for specific views, reports, specific probe metrics, etc.? PRD/List reports?
- How many seconds/minutes is the data presentation delayed?
- Is the performance issue consistent, intermittent, or 'random'?
- What are the circumstances under which the issue happens?
- Can the issue be reproduced?
- Ask your DBA to use SQL Server Profiler or an equivalent DB tool to identify long running jobs or jobs which take significant CPU, Memory, Disk I/O and overall run time. You can use the sqlserver probe to monitor the database server performance.

System Requirements and Configurations

DX UIM, Operator Console (OC), and Database Server sizing and hardware requirements followed?
- DX UIM Sizing Recommendations
Type/level of data storage, Tier 0, Tier 1, Tier2? (Tier 1 is recommended)
- Tier 1 storage type is appropriate for mission-critical data. You can use fast drives, all-flash storage (AFA), and hybrid-flash storage to store data in tier 1.
- This tier stores backup for mission-critical and business-critical data. Tier 1 data storage is designed for data that is highly time-sensitive, and volatile, and must be accessed quickly—in as close to real-time as possible.
SSD drives/RAID 5 or higher drives?

Log and Debugging Information

Examine the data_engine log at the time that corresponds to the slow performance.
- loglevel should be set to at least 3, higher debug is available at loglevel 5 and recommended.
- Set a data_engine logsize parameter of at least 300000
- Have you checked the true memory consumption by wasp via the logs at loglevel 5?
  - Note that it's best not to configure the wasp with a lot more java heap memory than it actually needs.
  - Check for 'OutOfMemory' exceptions, as well as how much memory is available after wasp startup, and use that as a guideline to set the java min and max memory and configure it with at least 1-2GB 'breathing room.'

MS SQL Server UIM Databases

Make sure the latest cumulative Microsoft SQL Server update has been applied to the database server, e.g., CU22 as of Sept. 2023.
Has the database server been adjusted according to documented best practices for MS SQL Server for DX UIM?
- DX UIM (Nimsoft) Database Best Practices for MS SQL Server

What SQL Server driver is configured and being used in the data_engine?
Has the large tables query been run to confirm how many tables are large and what their row count is?

Query to determine large tables:

SELECT

t.NAME AS TableName,

s.Name AS SchemaName,

p.rows AS RowCounts,

SUM(a.total_pages) * 8 AS TotalSpaceKB,

SUM(a.used_pages) * 8 AS UsedSpaceKB,

(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB

FROM

sys.tables t

INNER JOIN

sys.indexes i ON t.OBJECT_ID = i.object_id

INNER JOIN

sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id

INNER JOIN

sys.allocation_units a ON p.partition_id = a.container_id

LEFT OUTER JOIN

sys.schemas s ON t.schema_id = s.schema_id

WHERE

t.NAME NOT LIKE 'dt%'

AND t.is_ms_shipped = 0

AND i.OBJECT_ID > 255

GROUP BY

t.Name, s.Name, p.Rows

ORDER BY

RowCounts DESC

Recommendations

Database Type (Enterprise versus Standard)
- Enterprise Edition is highly recommended. (The UIM data_engine ONLY supports table partitioning with Enterprise editions.)

System Updates

Latest DX UIM Cumulative Update already applied?
- Please visit DX Unified Infrastructure Management - Cumulative Updates & Patches
- If you have any questions about the cumulative updates, please contact Support.

Database Maintenance and Checks

Have you/your DBA checked the size of the Database Transaction Log file? Is it being managed?
- A DBA should be maintaining the database if 'Full Recovery' is configured
Any Blocking or Locks in the Database server at the time of issue?
- For MS SQL Server, please consult with your DBA on the use of the following suggestions:

For MS SQL Server, to list the transaction log space usage stats, run the command DBCC SQLPERF(logspace) using SQL Server Studio (SSMS).
- List transaction log space usage statistics for all databases. See results-> log space Used in %.

DBCC SQLPERF(logspace)

-- Find locked tables and queries causing any locking issues.

SELECT L.request_session_id AS SPID,

DB_NAME(L.resource_database_id) AS DatabaseName,

O.Name AS LockedObjectName,

P.object_id AS LockedObjectId,

L.resource_type AS LockedResource,

L.request_mode AS LockType,

ST.text AS SqlStatementText,

ES.login_name AS LoginName,

ES.host_name AS HostName,

TST.is_user_transaction as IsUserTransaction,

AT.name as TransactionName,

CN.auth_scheme as AuthenticationMethod

FROM sys.dm_tran_locks L

JOIN sys.partitions P ON P.hobt_id = L.resource_associated_entity_id

JOIN sys.objects O ON O.object_id = P.object_id

JOIN sys.dm_exec_sessions ES ON ES.session_id = L.request_session_id

JOIN sys.dm_tran_session_transactions TST ON ES.session_id = TST.session_id

JOIN sys.dm_tran_active_transactions AT ON TST.transaction_id = AT.transaction_id

JOIN sys.dm_exec_connections CN ON CN.session_id = ES.session_id

CROSS APPLY sys.dm_exec_sql_text(CN.most_recent_sql_handle) AS ST

WHERE resource_database_id = db_id()

ORDER BY L.request_session_id

Use the script: sp_blocker_pss08 or SQL Trace/Profiler and the Blocked Process Report event class.

Check for fragmentation of key tables (CM_*, S_QOS_*, NAS_*)

A DAILY job should be run to defrag specific tables. For more information please refer to:

DX UIM - data_engine indexing option - Index Maintenance

Specifically, the following queries may help defragment the indices on commonly used tables for Operator Console - running these queries never hurts and often helps enhance performance:

ALTER INDEX ALL ON CM_COMPUTER_SYSTEM REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_DEVICE REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_COMPUTER_SYSTEM_ATTR REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_DEVICE_ATTRIBUTE REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_METRIC REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_DEFINITION REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_METRIC_DEFINITION REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_NIMBUS_ROBOT REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_DEVICE REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_COMPUTER_SYSTEM_ORIGIN REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_ATTRIBUTE REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_RELATIONSHIP_CI_CI REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_RELATIONSHIP_CI_CS REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_RELATIONSHIP_CS_CI REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON CM_DISCOVERY_NETWORK REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON S_QOS_DATA REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2PolicyTargetStatus REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2Device REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2ProfileCheckSum REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2Profile REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2ConfigValue REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2DeviceGroup REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2ProfileCheckSum REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2AuditTrail REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2Template REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2Container REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON SSRV2DevicePackage REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON S_QOS_DATA REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON NAS_TRANSACTION_LOG REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON NAS_TRANSACTION_SUMMARY REBUILD WITH (ONLINE = ON);
ALTER INDEX ALL ON NAS_ALARMS REBUILD WITH (ONLINE = ON);

MS SQL Server Memory
Has the DBA/VMware admin checked SQL Server 'memory pressure' on the database server over time? Ensure that the memory dedicated to SQL Server is at least 10GB above the average memory percentage over time.
- Memory Pressure Affecting Queries
How many large tables exist? Large tables (> 50 million rows)
- How to find the TOP 10 largest tables in your UIM Database
Has the partitioning query been run to confirm that partitioning is working?
- Check if the data_engine raw and historical data retention settings are being adhered to...
- How to determine if data_engine partitioning is working as expected in DX UIM

Resource and Hardware Monitoring

Check resource utilization trend on the database server and OC machine, CPU, Memory, Disk I/O, network speed/available bandwidth, proxy?
Enough memory dedicated to SQL Server, e.g., 64 GB RAM or more?

Connectivity Checks

Is the database server on the same subnet as the Primary hub? This is essential for Primary hub <->Database server connectivity and performance.
The database server should NOT be on a different subnet than the Primary and OC. This is not recommended.
When you run a traceroute/tracert command is it only 1-2 hops away from the Primary?

Operator Console (OC) Robot Checks

OC Robot processor speed? Ideally, it should be 3 GHz or higher.
What is the number of virtual processors configured on the OC Robot VM?
- Please ensure it is 4 virtual processors or higher.

Monitoring Practices

Checked and made sure that Nimsoft/UIM is completely excluded from AV blocking/scanning?
- On Windows, check Application or System log events to make sure there is nothing being blocked.
- Note that even Informational level log events may reveal a block, interference, lock or scan.

Checked OC Robot wasp java heap memory utilization via the wasp.log versus what is configured in the wasp.cfg.
- Confirmed wasp memory configured, versus what is being used over time?
  - For example, using the processes probe, configure a wasp java process profile for memory utilization in % in the PRD.
    - Examine the trend, results over time.
Monitoring Governance considerations:
- Too much non-critical/unnecessary QOS data or alarms being collected?
- Monitoring/polling intervals too low (frequent) for some QOS?
Tested slowness issue using different browsers after deleting the browser cache and closing all browsers, then starting a fresh new browser session?

Additional Information

How to generate a .har file for DX UIM Support Cases