A Data Collector in CA Performance Management (CAPM), which has a VNA instance resident (in this case, a Versa plugin is installed), has been going down very frequently. This is impacting data collection and overall system stability. There are 800 Versa VNA provisioned devices and over a 1000 SNMP devices polled via DC.
What is the root cause of these frequent outages and how can we identify if the issue is related to the Versa plugin or any underlying system/resource constraint?
DX NetOps CAPM and VNA all currently supported releases
Checking the Wildfly status shows it was killed by the Linux OOM:
[root@my_dc]# systemctl status wildfly.service
× wildfly.service - The WildFly Application Server
Loaded: loaded (/etc/systemd/system/wildfly.service; enabled; preset: disabled)
Active: failed (Result: oom-kill) since Sun 2026-05-03 19:02:30 UTC; 8h ago
Duration: 2d 14h 19min 45.044s
Process: 233834 ExecStart=/opt/CA/VNA/wildfly/bin/launch.sh $WILDFLY_MODE $WILDFLY_CONFIG $WILDFLY_BIND (code=killed, signal=TERM)
Main PID: 233834 (code=killed, signal=TERM)
In the /var/log/messages, we see:
May 03 19:02:29 my_dc systemd[1]: wildfly.service: A process of this unit has been killed by the OOM killer.May 03 19:02:30 my_dc systemd[1]: wildfly.service: Failed with result 'oom-kill'.
When the server is under extreme memory pressure, the kernel oom killer kills whatever process is consuming the largest amount of memory. The JRE will allocate the full amount specified and use it so the Linux OOM will protect the kernel by killing the process as it is consuming the most memory when memory resources become too limited.
Looking at the processes (dcmd, wildfly & MySQL) running on the DC, the -Xmx which specifies the maximum memory each is allocated goes up to 65GB
But the server itself has only 64GB:
#free -t
total used free shared buff/cache available
Mem: 65266896 51589608 1247104 633316 7795816 7677288Swap: 2097148 24464 2072684Total: 61164044 57614072 3319788
So the server memory allocation for the processes is over-subscribed. The Sum of XMX for the DC, VNA and MySql should be under MAX-2GB (where MAX = maximum available memory)
Change the dcmd allocation in /opt/DCM.cfg from whatever it is currently set to (in this example - 24GB):
IM_MAX_MEM=24000M
to something at least 2GB less. For example, 16GB:
IM_MAX_MEM=16000M
Then restart the dcmd service.