GemFire: Kubernetes Pods Silent Exit / CacheClosedException at Startup
search cancel

GemFire: Kubernetes Pods Silent Exit / CacheClosedException at Startup

book

Article ID: 434316

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

A Java-based GemFire client application running in Kubernetes may exit during startup or shortly after connecting to the GemFire cluster, logging CacheClosedException and entering a CrashLoopBackOff state.​

Symptoms

  • Log error similar to:
    org.apache.geode.cache.CacheClosedException: The cache is closed.
  • Application appears to start normally (for example, Spring application context initialization completes) but the pod terminates once GemFire regions are created.
  • The command "kubectl get pods" shows the pod in CrashLoopBackOff or with an increasing restart count.

Environment

  • Java-based GemFire client applications (Spring or non-Spring) running in Kubernetes
  • GemFire 10.x or later 
  • Kubernetes clusters where pod resources.limits.memory is set close to or below actual runtime usage

Cause

The pod is terminated by the Kubernetes OOM killer (OOMKill) because the container exceeds its configured memory limit.​

When the GemFire client starts and connects to the cluster it performs memory-intensive operations:

  • Handshake and metadata exchange: Downloading region metadata, cluster configuration, and PDX serialization registries.
  • Connection pool initialization: Creating multiple client connections, which consume non-heap and native memory.

If limits.memory is too low or the JVM heap (-Xmx) is configured too close to the container limit, total memory usage (Heap + Metaspace + native + thread stacks) briefly exceeds the limit. The Linux kernel OOM killer then terminates the process, often before a Java OutOfMemoryError can be logged, resulting in only final “cache is closing” messages as shutdown hooks run.

Resolution

Increase Kubernetes resource limits and align JVM settings so there is sufficient headroom for both heap and native memory.

Workaround

If immediate resource changes are constrained:

  • Temporarily reduce the number of regions or the amount of data eagerly initialized at startup.
  • Reduce client connection pool size so fewer sockets are created during initialization.
  • Disable or defer any heavy startup tasks (warm-up queries, bulk loads) until after the client is fully connected and stable.

Additional Information

Prevention / Best practices

  • Right-size Metaspace: Avoid overly restrictive -XX:MaxMetaspaceSize; allow room for GemFire classes, PDX types, and framework libraries.
  • Health probes: Configure liveness and readiness probes with sufficient initialDelaySeconds and timeouts so probes do not restart the pod during GemFire’s memory-intensive startup phase.​
  • Monitor memory usage: Use Kubernetes metrics, JVM tools (JFR/JMX), or Prometheus/Grafana dashboards to monitor heap, Metaspace, and RSS over time and adjust limits accordingly.​

Reference

VMware Tanzu GemFire on Kubernetes