During normal operation of the Layer7 v10.1 appliances (in OVA format) we encountered a serious anomaly that led to the shutdown of the Layer7 API GW product and the ssg service.
As can be seen from the logs of the machine, it seems that there is no longer any available space in RAM and swap.
At this point the killer process to ensure that the machine does not go into complete block has killed all processes including obviously that of JAVA thus also causing the kill of the ssg process where the Layer7 API GW product runs.
We ask you to analyze with us the reasons for this unnatural malfunction
/var/log/messages out-of-memory:
Jan 10 13:49:48 xxxxxxxxxxx kernel: org.springframe invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jan 10 13:49:48 xxxxxxxxxxx kernel: org.springframe cpuset=/ mems_allowed=0
Jan 10 13:49:48 xxxxxxxxxxx kernel: CPU: 6 PID: 21850 Comm: org.springframe Not tainted 3.10.0-1160.11.1.el7.x86_64 #1
Jan 10 13:49:48 xxxxxxxxxxx kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Jan 10 13:49:48 xxxxxxxxxxx kernel: Call Trace:
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd80faa>] dump_stack+0x19/0x1b
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd7b8ca>] dump_header+0x90/0x229
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab706602>] ? ktime_get_ts64+0x52/0xf0
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c22dd>] oom_kill_process+0x2cd/0x490
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c1ccd>] ? oom_unkillable_task+0xcd/0x120
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c29ca>] out_of_memory+0x31a/0x500
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd7c3e7>] __alloc_pages_slowpath+0x5db/0x729
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c8f46>] __alloc_pages_nodemask+0x436/0x450
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab818bb8>] alloc_pages_current+0x98/0x110
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7bdd97>] __page_cache_alloc+0x97/0xb0
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7c0d30>] filemap_fault+0x270/0x420
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffc0367756>] ext4_filemap_fault+0x36/0x50 [ext4]
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7ee01a>] __do_fault.isra.61+0x8a/0x100
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7ee5cc>] do_read_fault.isra.63+0x4c/0x1b0
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffab7f5e10>] handle_mm_fault+0xa20/0xfb0
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd8e653>] __do_page_fault+0x213/0x500
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd8e975>] do_page_fault+0x35/0x90
Jan 10 13:49:48 xxxxxxxxxxx kernel: [<ffffffffabd8a778>] page_fault+0x28/0x30
…
Jan 10 13:49:48 xxxxxxxxxxx kernel: Free swap = 0kB
Jan 10 13:49:48 xxxxxxxxxxx kernel: Total swap = 2097148kB
…
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 571] 0 571 12094 283 29 73 0 systemd-journal
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 591] 0 591 11646 1 24 428 -1000 systemd-udevd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 711] 0 711 29161 0 26 96 0 lvmetad
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 834] 0 834 13883 13 26 101 -1000 auditd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 869] 81 869 14530 1 33 157 -900 dbus-daemon
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 883] 38 883 11824 36 28 140 0 ntpd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 978] 0 978 116410 71 80 425 0 NetworkManager
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1044] 999 1044 153188 16 60 2263 0 polkitd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1115] 0 1115 46498 102 46 173 0 vmtoolsd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1156] 0 1156 14992 0 32 376 0 VGAuthService
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1412] 0 1412 28234 0 57 258 -1000 sshd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1413] 0 1413 105729 251 93 1818 0 rsyslogd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1428] 65 1428 111326 0 49 214 0 nslcd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1440] 0 1440 6655 18 19 109 0 systemd-logind
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1453] 0 1453 31604 16 20 148 0 crond
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1480] 1003 1480 70604 41 52 3420 0 hardserver
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1507] 0 1507 48485 0 54 146 0 su
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1532] 1006 1532 2818477 12512 567 133124 0 java
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1629] 0 1629 22536 11 42 267 0 master
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 1631] 89 1631 22606 18 45 266 0 qmgr
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 4543] 1003 4543 10395 1 24 2158 0 hardserver
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 5113] 0 5113 48485 0 52 145 0 su
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 5116] 1005 5116 2954 10 11 91 0 raserv
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 5118] 1004 5118 32938 59 19 361 0 snmpd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [14539] 0 14539 57113 138 65 1004 0 snmpd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [63987] 1007 63987 87341 25238 155 26159 0 splunkd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [63998] 1007 63998 21121 40 34 2710 0 splunkd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [44281] 0 44281 49263 0 52 192 0 su
Jan 10 13:49:48 xxxxxxxxxxx kernel: [44291] 1001 44291 3569259 107042 556 88606 0 java
Jan 10 13:49:48 xxxxxxxxxxx kernel: [48160] 0 48160 28329 0 10 80 0 agetty
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20359] 996 20359 99321 71 24 87 0 oneagentwatchdo
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20368] 996 20368 377504 4110 98 7159 0 oneagentos
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20395] 996 20395 103489 9168 107 35161 0 oneagentnetwork
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20429] 996 20429 43609 1007 17 82 0 oneagenteventst
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20600] 996 20600 137814 2384 58 6334 0 oneagentplugin
Jan 10 13:49:48 xxxxxxxxxxx kernel: [59916] 27 59916 997425 57001 544 145259 0 mysqld
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 8928] 0 8928 41452 9 82 383 0 sshd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 8939] 1002 8939 41452 34 82 369 0 sshd
Jan 10 13:49:48 xxxxxxxxxxx kernel: [ 8940] 1002 8940 29232 13 15 229 0 ssh_force_comma
Jan 10 13:49:48 xxxxxxxxxxx kernel: [15784] 0 15784 32149 47 20 61 0 anacron
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20690] 0 20690 55981 319 64 0 0 sudo
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20694] 1000 20694 29099 96 14 0 0 gateway_control
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20697] 1000 20697 6904255 4553813 9295 75 0 java
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20698] 1000 20698 27792 68 12 0 0 logger
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20699] 1000 20699 29099 96 12 0 0 gateway_control
Jan 10 13:49:48 xxxxxxxxxxx kernel: [20700] 1000 20700 27796 68 12 0 0 cat
Jan 10 13:49:48 xxxxxxxxxxx kernel: [23067] 89 23067 23340 315 45 0 0 pickup
Jan 10 13:49:48 xxxxxxxxxxx kernel: Out of memory: Kill process 20697 (java) score 520 or sacrifice child
Release : 10.1
If we check the messages file VMware is ballooning memory from the virtual machine which is causing the OOM killer to kill resources to free up memory
Jan 10 11:57:52 xxxxxxxxxx kernel: CPU: 4 PID: 5578 Comm: kworker/4:2 Not tainted 3.10.0-1160.11.1.el7.x86_64 #1
Jan 10 11:57:52 xxxxxxxxxx kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Jan 10 11:57:52 xxxxxxxxxx kernel: Workqueue: events_freezable vmballoon_work [vmw_balloon]
Jan 10 11:57:52 xxxxxxxxxx kernel: Call Trace:
Jan 10 11:57:52 xxxxxxxxxx kernel: [<ffffffffabd80faa>] dump_stack+0x19/0x1b
You have to check with the VM team to see if there is memory shortage on the ESX side which is causing the vmware balloon driver to steal memory from the VM or disabling the memory ballooning for these Virtual machine.