Data Collector with Versa Plugin Installed suffering frequent outages
search cancel

Data Collector with Versa Plugin Installed suffering frequent outages

book

Article ID: 441001

calendar_today

Updated On:

Products

Network Observability CA Performance Management

Issue/Introduction

A Data Collector in CA Performance Management (CAPM), which has a VNA instance resident (in this case, a Versa plugin is installed), has been going down very frequently. This is impacting data collection and overall system stability. There are 800 Versa VNA provisioned devices and over a 1000 SNMP devices polled via DC.

What is the root cause of these frequent outages and how can we identify if the issue is related to the Versa plugin or any underlying system/resource constraint?

Environment

DX NetOps CAPM and VNA all currently supported releases

Cause

Checking the Wildfly status shows it was killed by the Linux OOM:

[root@my_dc]# systemctl status wildfly.service
× wildfly.service - The WildFly Application Server
     Loaded: loaded (/etc/systemd/system/wildfly.service; enabled; preset: disabled)
     Active: failed (Result: oom-kill) since Sun 2026-05-03 19:02:30 UTC; 8h ago
   Duration: 2d 14h 19min 45.044s
    Process: 233834 ExecStart=/opt/CA/VNA/wildfly/bin/launch.sh $WILDFLY_MODE $WILDFLY_CONFIG $WILDFLY_BIND (code=killed, signal=TERM)
   Main PID: 233834 (code=killed, signal=TERM)


In the /var/log/messages, we see:

May 03 19:02:29 my_dc systemd[1]: wildfly.serviceA process of this unit has been killed by the OOM killer.
May 03 19:02:30 my_dc systemd[1]: wildfly.serviceFailed with result 'oom-kill'.

 

When the server is under extreme memory pressure, the kernel oom killer kills whatever process is consuming the largest amount of memory. The JRE will allocate the full amount specified and use it so the Linux OOM will protect the kernel by killing the process as it is consuming the most memory when memory resources become too limited.

Looking at the processes (dcmd, wildfly & MySQL) running on the DC, the -Xmx which specifies the maximum memory each is allocated goes up to 65GB

But the server itself has only 64GB:

#free -t
           
total      used     free    shared  buff/cache  available

Mem:     65266896  51589608  1247104    633316     7795816    7677288
Swap:     2097148     24464  2072684
Total:   61164044  57614072  3319788

So the server memory allocation for the processes is over-subscribed. The Sum of XMX for the DC, VNA and MySql should be under MAX-2GB (where MAX = maximum available memory)

 

Resolution

Change the dcmd allocation in /opt/DCM.cfg from whatever it is currently set to (in this example - 24GB):

IM_MAX_MEM=24000M

to something at least 2GB less. For example, 16GB:

 IM_MAX_MEM=16000M

Then restart the dcmd service.