Mirrorless Greenplum Cluster: Segments Fail to Restart with "Errno 12: Cannot allocate memory"
search cancel

Mirrorless Greenplum Cluster: Segments Fail to Restart with "Errno 12: Cannot allocate memory"

book

Article ID: 438347

calendar_today

Updated On:

Products

VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

In a mirrorless VMware Tanzu Greenplum cluster (often running on PowerFlex), the database becomes unavailable to users because multiple primary segments transition to a "Down" state. When the system attempts to perform an automatic restart or when an administrator manually attempts to bring segments online, the process fails with the following errors in the system logs or segment journals:

ERROR[8005] failed to terminate postgres child processes on port [PORT]: error: fork/exec /bin/bash: cannot allocate memory
ERROR[8004] failed to restart segment [PATH]: pg_ctl: server does not shut down
OSError: [Errno 12] Cannot allocate memory

 

This prevents the database from recovering, as the OS cannot fork the necessary processes to either terminate lingering sessions or start the postmaster for the affected segments.

Cause

The root cause is Operating System-level memory exhaustion (distinct from Greenplum VM Protect limits).

When a complex query or a high-concurrency event occurs, the segment nodes consume all available physical memory and swap space. This leads to a state where the Linux kernel cannot fulfill fork() or exec() requests, resulting in the "Errno 12: Cannot allocate memory" error.

In mirrorless configurations, the database lacks the redundancy to fail over, meaning any segment that fails to restart due to this memory pressure results in full cluster unavailability.

The failure of pg_ctl to shut down segments often occurs because the shutdown process itself requires forking a process that the depleted OS memory cannot support.

Resolution

To restore the cluster and stabilize memory usage, follow these steps:

  1. Manual Cleanup of Lingering Processes: Identify any orphaned postgres processes that are still bound to the segment ports (e.g., 6000, 6002). Use 
    lsof -i :[PORT] or ps -ef | grep postgres
    to find them and manually terminate them using 
    kill -9 [PID]
     to allow the ports to be reclaimed.
  2. Verify OS-Level Limits: Check the 
    ulimit -u
     (max user processes) on the affected segment nodes. If set too low, the system will return "cannot allocate memory" even if physical RAM is available. Increase this limit according to Greenplum best practices.
  3. Implement Query Plan Protections: Set the 
    gp_max_plan_size
     GUC to prevent exceptionally large query plans from exhausting system memory. Large plans are a frequent trigger for these OOM fork errors How to identify out of memory (OOM) errors.
  4. Adjust Memory Protection Limits: Recalculate and reduce 
    gp_vmem_protect_limit
     to ensure that the total memory allocated to all segments on a host does not exceed the physical RAM minus the requirements for the OS and other background processes Pivotal Greenplum Memory Configuration.
  5. Restart the Cluster: Once lingering processes are cleared and limits are adjusted, perform a clean restart using 
    gpstop -ar