Mom is taking more than 15 minutes to start and finish the process automatically. The service freeze on the below statement:
06:34:20.987 AM MST [INFO] [main] [Manager.AppMap] Starting ATC Initialization, DB version is 10.7.0.0, Application version is version 10.7.
Then you will see the below statement into the EM logs which means that the service is still starting:
06:34:25.001 AM MST [DEBUG] [master clock] [Manager.Clock] Total Time for harvest: 0ms
06:34:25.002 AM MST [DEBUG] [master clock] [Manager.Clock] Master harvest took 0ms
06:34:40.000 AM MST [DEBUG] [master clock] [Manager.Clock] Total Time for harvest: 0ms
06:34:40.000 AM MST [DEBUG] [master clock] [Manager.Clock] Master harvest took 0ms
06:34:55.001 AM MST [DEBUG] [master clock] [Manager.Clock] Total Time for harvest: 0ms
And release to continue the starting process after 15 minutes with the statement:
06:49:30.608 AM MST [DEBUG] [main] [Manager.AppMap.SQLDetail] Query (3319 records) took:117747, query: SELECT e.end_time, s.external_id AS source, s.layer AS s_layer, t.external_id AS target, t.layer AS t_layer, x.external_id AS transaction, x.layer AS x_layer, b.external_id AS backend, b.layer AS b_layer, e.attributes AS attributes FROM appmap_edges e JOIN appmap_id_mappings s ON s.vertex_id = e.source_id AND s.tenant_id = :tenant JOIN appmap_id_mappings t ON t.vertex_id = e.target_id AND t.tenant_id = :tenant LEFT JOIN appmap_id_mappings x ON x.vertex_id = NULLIF(e.transaction_id, 0) LEFT JOIN appmap_id_mappings b ON b.vertex_id = NULLIF(e.backend_id, 0) WHERE e.end_time > :lastTime
10/30/21 06:49:30.700 AM MST [DEBUG] [main] [Manager.AppMap] Status (APM_INFRASTRUCTURE:EM_MOM:[email protected];SuperDomain:APM Health Metrics:Harvest Duration) was changed to: 1
10/30/21 06:49:30.700 AM MST [DEBUG] [main] [Manager.AppMap] Status (APM_INFRASTRUCTURE:EM_MOM:[email protected];SuperDomain:APM Health Metrics:Average Process Time of Simple Alert Status Changes) was changed to: 1
What may is causing this problem?
Release : 10.7.0
Component : Introscope
The first thing to check is the performance of the whole cluster, take a look int the below KB:
https://knowledge.broadcom.com/external/article?articleId=9397
In case you still facing this behavior, much probably it is due to the size of your database, you can run a vacuum and our maintenance tool, please check:
https://knowledge.broadcom.com/external/article?articleId=48617
https://knowledge.broadcom.com/external/article?articleId=16443
After checking your cluster performance, database size, it will have reduced significantly the amount of time to start your EM, but in case it still takes too much time to start something between 5 and 9 minutes, much probably that your problem is the tables:
appmap_model_vertices
appmap_model_edges
appmap_model_attibutes
You must run a selection over the above tables in order to determine which one is causing the performance matter, Strongly suggest raising a ticket and asking assistance in order to get output, below is an example of select query for Vertices, which normally is the most cause:
SELECT COUNT(1) FROM appmap_model_vertices m
WHERE NOT EXISTS (
SELECT 1 FROM appmap_id_mappings id
JOIN appmap_vertices v ON v.vertex_id = id.vertex_id
WHERE id.external_id = m.external_id
)
SELECT COUNT(1) FROM appmap_id_mappings m
WHERE m.vertex_id > 0 AND m.type = 'V'
AND NOT EXISTS (SELECT 1 FROM appmap_vertices v WHERE v.vertex_id = m.vertex_id)
If this output brings more than 1M records, definitely you must delete the obsolete records, by running below query, you can delete it for Vertices:
DELETE FROM appmap_model_vertices
WHERE external_id IN
(SELECT m.external_id FROM appmap_model_vertices m
WHERE NOT EXISTS (
SELECT 1 FROM appmap_id_mappings id
JOIN appmap_vertices v ON v.vertex_id = id.vertex_id
WHERE id.external_id = m.external_id)
LIMIT 10000)
DELETE FROM appmap_id_mappings
WHERE vertex_id IN
(SELECT vertex_id FROM appmap_id_mappings m
WHERE m.vertex_id > 0 AND m.type = 'V'
AND NOT EXISTS (SELECT 1 FROM appmap_vertices v WHERE v.vertex_id = m.vertex_id)
LIMIT 10000)