Memory climbs constantly on 4 collectors and is currently at 90%. Restarting brings it back to a usable amount.
After analyzing the logs and the large files from the customer environment we identified the root cause for growing the ActiveMQ queue size.
With every subscription failure the OC frame work re-triggering the notification subscription (for every 50 secs). With existing ACI plugin, the alarm subscription doesn't maintain last Sync time and for every failure OC keep on publishing total alarms from both APICs and the Queue size is growing rapidly and at a certain point messaging Queue is unresponsive. This is being cause by frequent disconnections from load balancer so APIC Unresponsiveness during subscription renewals.
Release : 21.2.2
Component : Virtual Network Assurance For CA Spectrum
We will provide a fix for considering the time filter value and not fetching the alarms that are already processed (to restrict the message queue size growing with subscription restarts). The fix for the defect will be part of 21.2.5 monthly kit and the tentative date for customer download would be around 10th November.
Suggested cleanup (as workaround) till the Fix is available and production is upgraded with Patch.
a. Stop wildfly server
b. delete all files from /opt/CA/VNA/data/updates
c. delete activemq folder from /opt/CA/VNA/wildfly/standalone/data
d. delete tmp and log folders from /opt/CA/VNA/wildfly/standalone
d. Start wildfly server