GemFire Cache Nodes Stuck in Initializing Status During JAR Deployment
search cancel

GemFire Cache Nodes Stuck in Initializing Status During JAR Deployment

book

Article ID: 436426

calendar_today

Updated On:

Products

VMware Tanzu Data Intelligence

Issue/Introduction

When attempting to deploy an application JAR file to a VMware Tanzu GemFire cluster using the deploy --jar command, the following symptoms are observed:

  • The deploy --jar command hangs and does not complete.
  • After restarting cluster services (Locators and Servers) to troubleshoot, the cache server nodes remain stuck in "initializing" status indefinitely.
  • Servers fail to transition to an "online" or "running" state.
  • Logs may indicate connection timeouts or failures related to the RMI connector or JMX Manager.

Environment

  • VMware Tanzu GemFire (all versions supporting Cluster Configuration Service)
  • Distributed setups with separate Locator and Server nodes.

Cause

The issue is typically caused by a network connectivity failure on the JMX Manager port (default: 1099).

When a JAR is deployed, the Cluster Configuration Service stores it on the Locators. Upon startup, cache servers must connect to the Locator via JMX/RMI to download the cluster configuration and any deployed JAR files. If port 1099 is blocked by a firewall or network security group, the servers cannot retrieve these files, causing the initialization process to hang.

Resolution

To resolve this issue, follow the steps below:

Verify Network Connectivity

Confirm that the internal network allows bidirectional communication on the JMX manager port.

  • Identify the configured jmx-manager-port (default is often 1099.
  • Test connectivity from the cache server to the locator:       
         telnet [locator-ip] [jmx-manager-port]
  • Ensure that any firewalls (including NSX or local OS firewalls) are configured to allow traffic on all required GemFire ports.