Exported or manually crafted cluster configs causing cache server to not start when using cluster configuration service in VMware GemFire
search cancel

Exported or manually crafted cluster configs causing cache server to not start when using cluster configuration service in VMware GemFire

book

Article ID: 294344

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

In this scenario, you need cluster-configs to be exported to be used to start the cache servers to connect them to the cluster. However, the exported cluster-configs sometimes can have corrupted files and start showing an error on server start for a few reasons explained below:
 

  • If the cluster.xml file is created by hand or has been manually changed after exporting it from the cluster, before importing it back.
  • If the group.xml file is manually tweaked and if PdxType definitions are added to group config file. PdxTypes are applicable cluster wide and are not supported at group level, hence adding that to the group config file, it will be simply ignored or may lead to an error on server start-up.
  • JAR files duplication in CLASSPATHas well JAR files added inside the cluster and group folders in the cluster-config service ZIP bundle.


Error message in log file:

[error 2020/05/11 12:27:53.212 PDT <main> tid=0x1] Cache initialization for GemFireCache[id = 439636632; isClosing = false; isShutDownAll = false; created = Mon May 11 12:27:52 PDT 2020; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed because: org.apache.geode.pdx.PdxInitializationException: Could not create pdx registry


Stack Trace:

Exception in thread "main" org.apache.geode.pdx.PdxInitializationException: Could not create pdx registry
	at org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:202)
	at org.apache.geode.pdx.internal.TypeRegistry.creatingDiskStore(TypeRegistry.java:267)
	at org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:160)
	at org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:817)
	at org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:808)
	at org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:520)
	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:339)
	at org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4115)
	at org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:208)
	at org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1409)
	at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1374)
	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:191)
	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
	at org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
	at org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:892)
	at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:807)
	at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:737)
	at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:256)
Caused by: org.apache.geode.cache.RegionExistsException: /PdxTypes
	at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2954)
	at org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78)
	at org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:200)


This has been observed in VMware GemFire version 9.7.2 when trying to import the broken cluster-configs or manually altered cluster-configs.

Note: This issue has been observed for newer versions too, such as VMware GemFire 9.9.1 and 9.10.1. 

Export is as a ZIP file, and then before importing it on the upgraded cluster, verify if the cluster.xml file (by extracting the earlier export) looks similar to the following - if it does, then that’s the problem and needs to be fixed manually (refer to the workaround below).

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<cache version="1.0" xsi:schemaLocation="gpdb http://schema.pivotal.io/gemfire/gpdb/gpdb-3.3.xsd http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd http://geode.apache.org/schema/jdbc http://geode.apache.org/schema/jdbc/jdbc-1.0.xsd gpdb http://schema.pivotal.io/gemfire/gpdb/gpdb-3.3.xsd" xmlns="http://geode.apache.org/schema/cache" xmlns:gpdb="http://schema.pivotal.io/gemfire/gpdb" xmlns:jdbc="http://geode.apache.org/schema/jdbc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>


Another reason this error could occur is when group-configs are created with Pdx definitions. The Pdx is not supported for group level configs and should not be added to group-name.xml file. 


Environment

Product Version: 9.7

Resolution

To fix the problem manually, you must clean the cluster.xml file. Please follow the steps below:

1. Export the cluster-config from a running or healthy cluster before you start the upgrade of your members.

gfsh>export cluster-configuration --zip-file-name=clusterconfig.zip
File saved to /clusterconfig.zip

     
2. Extract the exported ZIP file and make sure to clean the cluster.xml file if it has the contents shown above.

3. Re-zip the whole folder again and then import the configs.

gfsh>import cluster-configuration --zip-file-name=clusterconfig.zip
This command will replace the existing cluster configuration, if any, The old configuration will be backed up in the working directory.

Continue?  (Y/n): yes
Cluster configuration successfully imported.


4. Start the cache server member, which was taken down for upgrade, with its standard flags as per your environment or cluster design etc.

5. Also make sure the JAR files are not duplicated in cluster-configs and global CLASSPATH. If there is a JAR in your startup script command from CLASSPATH, then make sure the same JAR is not a part of the clusterconfig.zip bundle you have imported.