GemFire Server Startup Seems to Hang Indefinitely
search cancel

GemFire Server Startup Seems to Hang Indefinitely

book

Article ID: 294126

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

Symptoms:

The purpose of this article is to describe how to fix an issue where GemFire takes forever to startup and the gfsh command "list regions" seems to hang.

This issue may be indicated by the following symptoms:

  • When servers are started using GFSH, the status dots you see following the start server command do not end.
  • With "fine-level" logging enabled, you see an exception similar to the below:

    [fine 2016/07/26 17:28:33.529 EDT server1.1.staging <Function Execution Processor1> tid=0x181] GemFire:service=Region,name=/TESTREGION,type=Member,member=server1.1.staging
    javax.management.InstanceNotFoundException: GemFire:service=Region,name=/TESTREGION,type=Member,member=server1.1.staging
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
    
  • The region mentioned in the log (/TESTREGION in the example above), is not mentioned anywhere in the cache.xml and is not created programmatically.
  • The gfsh command "list region" does not return and seem to hang forever.
  • Pulse does not show any regions.

Environment


Cause

This problem occurs when:

  1. The cluster configuration service is enabled ("enable-cluster-configuration=true" on locators and "use-cluster-configuration=true" on servers).
  2. Some persistent regions are created using gfsh and are populated with data.
  3. Servers are restarted and a new cache.xml is supplied as a parameter to the start server command that has no mention of these regions created (see 2).

In the above scenario, before the restart, as the cluster configuration service is enabled, the locator distributes a jar file to all the servers. This jar file contains an XML file that GemFire created, taking into account all the gfsh commands that were run previously. For example, if a region /TESTREGION was created using gfsh previously, and now the servers are restarted with cache.xml which has no mention of TESTREGION, then the startup hangs.

Resolution

Follow these steps to resolve this issue:

  • Stop all the servers and locators.
  • Disable cluster configuration service (--enable-cluster-configuration=false on locators and --use-cluster-configuration=false on servers).
  • Delete the contents of /cluster/cluster_config directory in all the locators working directory.
  • Set the parameter gemfire.disk.recoverValues=false for the servers. This property prevents the persisted data from being loaded on the server startup. It just loads the keys.
  • Start all the locators and servers.