This article will explain memory configuration in Pivotal Greenplum (GPDB).
Pivotal Greenplum consists of multiple segments. When the query is executed, the query plan is prepared on the master segment and then the query plan is sent and executed on the segments. Every segment executes the query plan by itself (with certain communications happening among segments, of course). When each segment executes the query plan, it will do this with its own memory allocation, its own local transactions, and its own data files.
The total amount of memory a segment is allowed to allocate at any time is controlled by the gp_vmem_protect_limit
. This means that all sessions (session processes) executing at a certain time on a specific segment should not use more than this amount of memory. The next session that crosses the limit will be terminated with the Out of Memory (OOM) VM Protect error. Note that this may not be the session that is hogging most of the memory, but the session that crosses the limit.
Example:
gp_vmem_protect_limit=8GB, Session1: 4GB (running), Session2: 3GB (running) Session3: 1.5GB (terminated with OOM error, crossed the 8GB limit)
To arrive at a gp_vmem_protect_limit
for a particular cluster please use the following link:
http://greenplum.org/calc/Hint: Add 2 to # of primary segments per host to account for any failovers. Physical RAM and swap can be found from command "
free -g
", shows in GB
To change gp_vmem_protect_limit
, a Greenplum Instance restart is required; follow these instructions to change:
gpconfig -s gp_vmem_protect_limit gpconfig -c gp_vmem_protect_limit -v <value-from-vmem-calc-in-MB>
Example, gpconfig -c gp_vmem_protect_limit -v 10922
It is necessary to do a gpstop
or gpstart
after this, so there would be a downtime for the change to take effect:
gpstop -a gpstart -a gpconfig -s gp_vmem_protect_limit -->> Should return 10922
Change vm.overcommit_ratio
,
As root cp -p /etc/sysctl.conf /etc/sysctl.conf.<date> vi /etc/sysctl.conf below vm.overcommit_memory add below line vm.overcommit_ratio = <value-from-vmem-calc>, say 90 save and exit file sysctl -p to verify changes do below cat /proc/sys/vm/overcommit_ratio -->> should return 90
Note 1
Allowing GPDB segments to allocate up to the full amount of the physical memory will possibly lead to memory usage which is (a little bit) more than the physical memory. This is because kernel, mirror segments, and other applications can use some memory too, but it should not be too much more. This may lead to a little bit of paging, but in general, will not lead to excessive swapping.
Note 2
If there are other memory intensive applications or processes running on the server, the gp_vmem_protect_limit
should be adjusted accordingly by lowering the amount of available server memory by the amount used by the memory intensive processes.
Note 3
If the preference is towards swapping, but no OOM errors, then the gp_vmem_protect_limit
can be configured so that some of the swap memory is used in the calculation. This will make OOM errors fewer, but some sessions could be slowed down because of the swap in/swap out activity. Also, make sure you understand vm.overcommit_*
parameters so the total available server memory on OS level is not exhausted (hard OOM error).
Note 4
On DCA: RAM=48GB
, swap=48GB
, vm.overcommit_memory=2
, vm.overcommit_ratio=50
, the total memory available before OS returns OOM is swap+50%ofRAM=48+24=72GB
. The total memory all processes on this server can use should be less than 72GB. Adjust gp_vmem_protect_limit
accordingly.
Memory configuration policy is dictated by parameter gp_resqueue_memory_policy
. gp_resqueue_memory_policy=manual
means using parameter work_mem
per operator node (this is obsolete, left for compatibility only). The new options auto
or eager_free
rely on the statement_mem
parameter. statement_mem
parameter sets the amount of memory the statement can use. This memory is then distributed among the query plan operators according to the operator need for memory. In auto
mode the memory is distributed among query plan operators statically and not changed during plan execution. In eager_free
mode, the memory is used and reused within the plan execution (if a specific node or set of nodes are finished with data processing and not needed anymore, their memory is freed immediately and given to other nodes that are running currently).
Each session can control the statement_mem
- the parameter can be changed in the session between queries. This means that you can set statement_mem=100MB
if you have a small query to run and after that set statement_mem=2GB
if you have a huge
query to run, all within the same session. Every user can change statement_mem
, up to the value of max_statement_mem
which can be set only by superuser in the postgresql.conf
file.
Another way to limit statement memory is on resource queue level. Resource queues have active connections limit and memory limit. Each session gets memory_limit
or active_connections_limit
memory. This former is true unless statement_mem
is configured higher than that value, in this case, the session will get statement_mem
amount of memory, but the session that goes over the RQ memory limit will have to wait until memory is freed.
Example 1
Resource queue or RQ active sessions = 10, RQ memory limit=8GB, statement_mem=125MB
-> each session on the RQ gets 800MB (> 125MB), all sessions will use in total 8GB.
Example 2
RQ active sessions = 10, RQ memory limit=8GB
, statement_mem=125MB
-> each session on the RQ should use 800MB. If the first session sets the statement_mem=4GB
manually (4GB > 125MB), second session sets statement_mem=3GB
(3GB > 125MB), then 7GB of RQ memory limit is already used. Then the third session can run (7GB + 800MB=7.8GB), but Session 4 (needs 800MB more) will be held by the resource manager until memory is freed by the other sessions running on this queue.
There are many different OOM errors that come from different places of the database. There are two general groups of errors:
- "VM Protect" errors ("VM Protect failed to allocate %d bytes, %d MB available"): these are logical OOM errors. They mean that the database is stopping a session from allocating more memory, because it has reached the VM Protect Limit value. This means that sessions executing on this segment are attempting together to use more than configured limit. The appropriate action, in this case, is to attempt to determine whether memory parameters are set properly and whether they can support the concurrency in the system.
Example 1
gp_vmem_protect_limit=8GB, statement_mem=512MB -> max concurrency of 16 can be supported (16 * 512MB = 8GB).
Example 2
gp_vmem_protect_limit=8GB, statement_mem=256MB -> max concurrency of 32 can be supported (32 * 256MB = 8GB).
OS level OOM errors ("Out of memory. Failed on request of size %lu bytes."): these are OS OOM errors. They mean that the database allows sessions to allocate memory, but the OS does not have more memory left. Basically, this means an overcommitted gp_vmem_protect_limit
.
Example 3
RAM=48GB, SWAP=48GB, 6 primary segments, gp_vmem_protect_limit=8GB > total memory available per server = 72GB (48+50%of48), total memory used by database = 6x8=48GB, total memory used by the server = 50GB (approximately, including kernel, mirror segments, etc.), 50GB < 72GB (no OS OOM errors).
Example 4
RAM=48GB
, SWAP=48GB
, 6 primary segments, gp_vmem_protect_limit=8GB
, 10JVMs using 4GB each > total memory available per server = 72GB (48+50%of48), total memory used by database = 6x8=48GB, total memory used by the server = 50GB (approximately, including kernel, mirror segments, etc.) + 10x4GB (JVMs) = 90GB, 90GB > 72GB > some process (may be GPDB) will get OS OOM error.
Pivotal Greenplum all versions