out-of-memory - Layer7 Appliance (OVA) 10.1 - kernel: org.springframe invoked oom-killer
search cancel

out-of-memory - Layer7 Appliance (OVA) 10.1 - kernel: org.springframe invoked oom-killer

book

Article ID: 258107

calendar_today

Updated On:

Products

CA API Gateway

Issue/Introduction

During normal operation of the Layer7 v10.1 appliances (in OVA format) we encountered a serious anomaly that led to the shutdown of the Layer7 API GW product and the ssg service.

As can be seen from the logs of the machine, it seems that there is no longer any available space in RAM and swap.

At this point the killer process to ensure that the machine does not go into complete block has killed all processes including obviously that of JAVA thus also causing the kill of the ssg process where the Layer7 API GW product runs.

We ask you to analyze with us the reasons for this unnatural malfunction 

/var/log/messages  out-of-memory:

Jan 10 13:49:48 xxxxxxxxxxx kernel: org.springframe invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

Jan 10 13:49:48 xxxxxxxxxxx kernel: org.springframe cpuset=/ mems_allowed=0

Jan 10 13:49:48 xxxxxxxxxxx kernel: CPU: 6 PID: 21850 Comm: org.springframe Not tainted 3.10.0-1160.11.1.el7.x86_64 #1

Jan 10 13:49:48 xxxxxxxxxxx kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018

Jan 10 13:49:48 xxxxxxxxxxx kernel: Call Trace:

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd80faa>] dump_stack+0x19/0x1b

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd7b8ca>] dump_header+0x90/0x229

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab706602>] ? ktime_get_ts64+0x52/0xf0

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c22dd>] oom_kill_process+0x2cd/0x490

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c1ccd>] ? oom_unkillable_task+0xcd/0x120

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c29ca>] out_of_memory+0x31a/0x500

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd7c3e7>] __alloc_pages_slowpath+0x5db/0x729

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c8f46>] __alloc_pages_nodemask+0x436/0x450

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab818bb8>] alloc_pages_current+0x98/0x110

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7bdd97>] __page_cache_alloc+0x97/0xb0

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c0d30>] filemap_fault+0x270/0x420

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffc0367756>] ext4_filemap_fault+0x36/0x50 [ext4]

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7ee01a>] __do_fault.isra.61+0x8a/0x100

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7ee5cc>] do_read_fault.isra.63+0x4c/0x1b0

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7f5e10>] handle_mm_fault+0xa20/0xfb0

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd8e653>] __do_page_fault+0x213/0x500

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd8e975>] do_page_fault+0x35/0x90

Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd8a778>] page_fault+0x28/0x30

Jan 10 13:49:48 xxxxxxxxxxx kernel: Free swap  = 0kB

Jan 10 13:49:48 xxxxxxxxxxx kernel: Total swap = 2097148kB

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  571]     0   571    12094      283      29       73             0 systemd-journal

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  591]     0   591    11646        1      24      428         -1000 systemd-udevd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  711]     0   711    29161        0      26       96             0 lvmetad

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  834]     0   834    13883       13      26      101         -1000 auditd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  869]    81   869    14530        1      33      157          -900 dbus-daemon

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  883]    38   883    11824       36      28      140             0 ntpd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [  978]     0   978   116410       71      80      425             0 NetworkManager

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1044]   999  1044   153188       16      60     2263             0 polkitd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1115]     0  1115    46498      102      46      173             0 vmtoolsd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1156]     0  1156    14992        0      32      376             0 VGAuthService

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1412]     0  1412    28234        0      57      258         -1000 sshd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1413]     0  1413   105729      251      93     1818             0 rsyslogd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1428]    65  1428   111326        0      49      214             0 nslcd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1440]     0  1440     6655       18      19      109             0 systemd-logind

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1453]     0  1453    31604       16      20      148             0 crond

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1480]  1003  1480    70604       41      52     3420             0 hardserver

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1507]     0  1507    48485        0      54      146             0 su

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1532]  1006  1532  2818477    12512     567   133124             0 java

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1629]     0  1629    22536       11      42      267             0 master

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1631]    89  1631    22606       18      45      266             0 qmgr

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 4543]  1003  4543    10395        1      24     2158             0 hardserver

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 5113]     0  5113    48485        0      52      145             0 su

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 5116]  1005  5116     2954       10      11       91             0 raserv

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 5118]  1004  5118    32938       59      19      361             0 snmpd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [14539]     0 14539    57113      138      65     1004             0 snmpd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [63987]  1007 63987    87341    25238     155    26159             0 splunkd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [63998]  1007 63998    21121       40      34     2710             0 splunkd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [44281]     0 44281    49263        0      52      192             0 su

Jan 10 13:49:48 xxxxxxxxxxx kernel: [44291]  1001 44291  3569259   107042     556    88606             0 java

Jan 10 13:49:48 xxxxxxxxxxx kernel: [48160]     0 48160    28329        0      10       80             0 agetty

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20359]   996 20359    99321       71      24       87             0 oneagentwatchdo

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20368]   996 20368   377504     4110      98     7159             0 oneagentos

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20395]   996 20395   103489     9168     107    35161             0 oneagentnetwork

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20429]   996 20429    43609     1007      17       82             0 oneagenteventst

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20600]   996 20600   137814     2384      58     6334             0 oneagentplugin

Jan 10 13:49:48 xxxxxxxxxxx kernel: [59916]    27 59916   997425    57001     544   145259             0 mysqld

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 8928]     0  8928    41452        9      82      383             0 sshd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 8939]  1002  8939    41452       34      82      369             0 sshd

Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 8940]  1002  8940    29232       13      15      229             0 ssh_force_comma

Jan 10 13:49:48 xxxxxxxxxxx kernel: [15784]     0 15784    32149       47      20       61             0 anacron

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20690]     0 20690    55981      319      64        0             0 sudo

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20694]  1000 20694    29099       96      14        0             0 gateway_control

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20697]  1000 20697  6904255  4553813    9295       75             0 java

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20698]  1000 20698    27792       68      12        0             0 logger

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20699]  1000 20699    29099       96      12        0             0 gateway_control

Jan 10 13:49:48 xxxxxxxxxxx kernel: [20700]  1000 20700    27796       68      12        0             0 cat

Jan 10 13:49:48 xxxxxxxxxxx kernel: [23067]    89 23067    23340      315      45        0             0 pickup

Jan 10 13:49:48 xxxxxxxxxxx kernel: Out of memory: Kill process 20697 (java) score 520 or sacrifice child

Environment

Release : 10.1

Resolution

If we  check the messages file VMware is ballooning memory from the virtual machine which is causing the OOM killer to kill resources to free up memory 

Jan 10 11:57:52 xxxxxxxxxx kernel: CPU: 4 PID: 5578 Comm: kworker/4:2 Not tainted 3.10.0-1160.11.1.el7.x86_64 #1
Jan 10 11:57:52 xxxxxxxxxx kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Jan 10 11:57:52 xxxxxxxxxx kernel: Workqueue: events_freezable vmballoon_work [vmw_balloon]
Jan 10 11:57:52 xxxxxxxxxx kernel: Call Trace:
Jan 10 11:57:52 xxxxxxxxxx kernel: [<ffffffffabd80faa>] dump_stack+0x19/0x1b


You have to check with the VM team to see if there is memory shortage on the ESX side which is causing the vmware balloon driver to steal memory from the VM or disabling the memory ballooning for these Virtual machine.