Gemfire Cluster startup fails with error "[error][gc] Failed to commit memory (Not enough space)"

Products

VMware Tanzu Data Suite

Issue/Introduction

The error message 'Failed to commit memory (Not enough space)' when you start locator is a direct signal from the JVM that it’s asking the Operating System for memory that isn't available.

[0.014s][error][gc] Failed to commit memory (Not enough space)
[0.015s][error][gc] Failed to commit memory (Not enough space)
[0.015s][error][gc] Forced to lower max Java heap size from 2048M(100%) to 784M(38%)
[0.015s][error][gc] Failed to allocate initial Java heap (2048M)
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Exception occurred when starting the locator, please check the locator log for details

This article goes over OS-level configurations to ensure that sufficient memory is allocated to Gemfire at startup.

Environment

All supported Gemfire versions on Linux

Resolution

Follow these steps to identify and resolve OS-level memory bottlenecks.

1. Verify Physical Memory Availability

First, ensure the host has enough physical RAM available to satisfy the -Xms (Initial Heap) request.

bash-4.4$ free -m
              total        used        free      shared  buff/cache   available
Mem:         128400       40103       84102           0        4194       8716

Swap:          5119        2457        2662
 

Note: If available memory is less than your GemFire heap settings, you must increase physical RAM.

2. Check and Set `memlock` Limits

The max locked memory (memlock) must be greater than the heap size.

Check current limit:

Bash 
ulimit -l

If the output is a small value (e.g., 64), it will cause startup failure.

How to Fix:

Edit /etc/security/limits.conf and set the value to unlimited or a value larger than your total JVM memory for the user running GemFire:

Plaintext 
<user_name> soft memlock unlimited
<user_name> hard memlock unlimited

Note: You must log out and log back in for ulimit changes to take effect.

3. Configure Huge Pages

If you are using -XX:+UseLargePages, the OS must have enough pre-allocated HugePages available. If the sum of all JVM heaps on a physical host(locators and servers) exceeds the available Huge Pages, the JVM will fail to commit memory with the -XX:+AlwaysPreTouch flag.

Calculate Required Huge Pages:

To find the required number of pages, use the following formula:

vm.nr_hugepages = (Total Heap MB / HugePage Size in MB) * 1.05

(The 1.05 adds a 5% safety buffer for JVM overhead).

Check Current OS Huge Page Status:

Bash 
bash-4.4$ grep -i huge /proc/meminfo
AnonHugePages:   1165312 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   18278
HugePages_Free:    12087
HugePages_Rsvd:      193
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        37433344 kB

Look for Hugepagesize (usually 2048 kB) and HugePages_Total. If HugePages_Total is less than requested then the OS is fragmented and you need to reboot the machine.

How to Fix:

sudo nano /etc/sysctl.d/99-hugepages.conf
vm.nr_hugepages = <calculated_number>
sudo sysctl --system

4. Check Memory Overcommit Policy

Set overcommit_memory to 0.

Check setting:

Bash 
cat /proc/sys/vm/overcommit_memory

How to Fix:

sudo sysctl -w vm.overcommit_memory=0

Note on Garbage Collection (GC)

In GemFire 10+, the default GC for many configurations is ZGC which is recommended only for heaps greater than 32GB. Since Locators typically have small heaps (< 4GB), it is recommended to use G1GC (-XX:+UseG1GC). ZGC attempts to reserve the entire max heap size immediately on startup. This is why a Locator using ZGC is much more likely to trigger this error than one using G1GC.