Data Collector with Versa Plugin Installed suffering frequent outages

search cancel

Data Collector with Versa Plugin Installed suffering frequent outages

book

Article ID: 441001

calendar_today

Updated On:

Products

Network Observability CA Performance Management

Issue/Introduction

A Data Collector in CA Performance Management (CAPM), which has a VNA instance resident (in this case, a Versa plugin is installed), has been going down very frequently. This is impacting data collection and overall system stability. There are 800 Versa VNA provisioned devices and over a 1000 SNMP devices polled via DC.

What is the root cause of these frequent outages and how can we identify if the issue is related to the Versa plugin or any underlying system/resource constraint?

Environment

DX NetOps CAPM and VNA all currently supported releases

Cause

Checking the Wildfly status shows it was killed by the Linux OOM:

[root@my_dc]# systemctl status wildfly.service × wildfly.service - The WildFly Application Server Loaded: loaded (/etc/systemd/system/wildfly.service; enabled; preset: disabled) Active: failed (Result: oom-kill) since Sun 2026-05-03 19:02:30 UTC; 8h ago Duration: 2d 14h 19min 45.044s Process: 233834 ExecStart=/opt/CA/VNA/wildfly/bin/launch.sh $WILDFLY_MODE $WILDFLY_CONFIG $WILDFLY_BIND (code=killed, signal=TERM) Main PID: 233834 (code=killed, signal=TERM)

In the /var/log/messages, we see:

May 03 19:02:29 my_dc systemd[1]: wildfly.service: A process of this unit has been killed by the OOM killer.
May 03 19:02:30 my_dc systemd[1]: wildfly.service: Failed with result 'oom-kill'.

When the server is under extreme memory pressure, the kernel oom killer kills whatever process is consuming the largest amount of memory. The JRE will allocate the full amount specified and use it so the Linux OOM will protect the kernel by killing the process as it is consuming the most memory when memory resources become too limited.

Looking at the processes (dcmd, wildfly & MySQL) running on the DC, the -Xmx which specifies the maximum memory each is allocated goes up to 65GB

But the server itself has only 64GB:

#free -t total used free shared buff/cache available

Mem: 65266896 51589608 1247104 633316 7795816 7677288Swap: 2097148 24464 2072684Total: 61164044 57614072 3319788

So the server memory allocation for the processes is over-subscribed. The Sum of XMX for the DC, VNA and MySql should be under MAX-2GB (where MAX = maximum available memory)

Resolution

Change the dcmd allocation in /opt/DCM.cfg from whatever it is currently set to (in this example - 24GB):

IM_MAX_MEM=24000M

to something at least 2GB less. For example, 16GB:

IM_MAX_MEM=16000M

Then restart the dcmd service.

Feedback

thumb_up Yes

thumb_down No