Linux oom-killer kills Greenplum queries in a Greenplum Database 7.x
search cancel

Linux oom-killer kills Greenplum queries in a Greenplum Database 7.x

book

Article ID: 368651

calendar_today

Updated On:

Products

VMware Tanzu Data Suite Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum

Issue/Introduction

Postgres processes are killed by the the Linux Out of Memory Killer (oom-killer) when Resource Groups are used in Greenplum 7.x.

The journalctl output and/or the /var/log/messages file for the time of the issue shows:

May 19 15:00:58 mdw kernel: postgres invoked oom-killer:....

Cause

The oom-killer is activated due to limits set in cgroups of the host.

To check the cgroup settings on the host run:

cgget -g memory:gpdb
cgget -g memory:gpdb/6437
cgget -g memory:gpdb/6438

The number "6437" and 6438" are the OIDs of the resource groups "default_group" and "admin_group". Other resource groups can also be checked.

Check the "memory.limit_in_bytes" setting for each cgroup. The value, in Greenplum 7.x, should be very large like "9223372036854771712". Largest signed 64 bit number. This is effectively "unlimited".

In Greenplum 7.x, the Greenplum DB does NOT set limits in the cgroups. It simply uses the cgroups to track memory usage. 

In Greenplum 6.x, the limit in the cgroups is set by the Greenplum database and will correspond to the "MEMORY_LIMIT" value of the resource group. So this KB does NOT apply to Greenplum 6.x

The limits may have been set because an instance of the Greenplum DB 6.x was previously running on the cluster.

The limits may be set in the cgroups config file, /etc/cgconfig.conf.

Resolution

Ensure there are no limits set in the cgroups config file.

Shutdown the database.

Reboot all hosts in the cluster to clear the cgroups limits.

Restart the database

Verify with "cgget" commands above that there are no limits set for the memory of the cgroups.