Steps taken:
-
Set hub_bulk_size in data_engine to 1750 (from 2000).
-
Set thread_count_insert to 24 from 5. There are 24 virtual processors on the Primary hub.
-
Cold started data_engine (Deactivate-Activate)
-
data_engine cleared in about 20 minutes. ~43 million messages.
UIM Database-> 1614 tables. MOST of the tables have data from 3/5/2021 up until today and a few back to 3/6/2021.
Set raw and historical retention from its original of 60 and 60 to 25 and 25 (customer-set), but then I reset it to 35, 180 respectively for raw and historical.
Ran select top (1) sampletime from RN_QOS_DATA_0001 and other RN tables to gauge the oldest data which was only 30/31 days so it was safe to set the data_engine retention values lower without ending up with long running hung jobs.
SQL Server had a hung job which finally finished. This was the job.
delete from ssrv2audittrail where id in ( select top(@P0) st.id from ssrv2audittrail st where st.timestamp < @P1 order by st.timestamp asc )
Some data_engine log entries showed LONG INSERT times for QOS_INTERFACE_* data, e.g., 3331177 = 55 minutes to insert that data for QOS_INTERFACE_BYTESOUT and other QOS_INTERFACE_* tables.
Note that via the data_engine GUI, you can double-click on a given QOS object so specific QOS can be pruned at a lower, more conservative setting, e.g., 5 days for raw, 15 for historical to keep the size of the tables down. Most of the Interface tables were very large (21 Tables). Tables may be highly fragmented.
During the time period in which the data_engine was highly queued up, the udm_manager probe turned red
udm_manager probe was constantly restarting, unable to connect so we adjusted the udm_manager.cfg settings (see below) and reinstalled it as well and it recovered.
memory_index_max = 516m
memory_index_threshold = 64m
object_cache_max = 256m
heartbeat_interval_msec = 30000
Truncated super large MCS audit tables:
TRUNCATE TABLE SSRV2AuditTrailModification;
ALTER TABLE SSRV2AuditTrailModification DROP CONSTRAINT FK_SSRV2AuditTrailModification_SSRV2AuditTrail;
TRUNCATE TABLE SSRV2AuditTrail;
ALTER TABLE SSRV2AuditTrailModification WITH NOCHECK ADD CONSTRAINT FK_SSRV2AuditTrailModification_SSRV2AuditTrail FOREIGN KEY(auditrecord)
REFERENCES SSRV2AuditTrail(id);
ALTER TABLE SSRV2AuditTrailModification NOCHECK CONSTRAINT FK_SSRV2AuditTrailModification_SSRV2AuditTrail;
Other recommendations provided:
1. 64 GB RAM is dedicated to SQL Server but usage is 57 out of 64GB so recommended setting it to 85 GB RAM to be dedicated to the UIM DB.
2. Advised to keep track of the size (number of rows) of the SSRV2AuditTrail and SSRV2AuditTrailModification tables to make sure they are now being pruned every 30 minutes.
3. Keep track of the UIM database .MDF file growth periodically, and if it's still growing too fast or growing too large unexpectedly, ask DBA to check the file to see if they can determine the source of the file growth and open a support case.