Let's see why the java.util.ConcurrentModificationException at Remote Member exception can be hit when restarting a GemFire cache server and what workarounds can be applied to resolve this issue.
When starting several cache servers at a time, cache server A throws a ConcurrentModificationException when initializing a region like shown in the below log file exert. Cache server A failed to start while the other cache servers started successfully. When cache server A has restarted again, it starts successfully.
info 2016/08/16 09:17:43.152 JST tid=0x1] Initializing region exampleRegionX [info 2016/08/16 09:17:43.219 JST tid=0x1] Region exampleRegionX requesting initial image from 192.168.1.10(27972):10344 [info 2016/08/16 09:17:43.260 JST tid=0x1] exampleRegionX failed to get image from 192.168.1.10(27972):10344 [warning 2016/08/16 09:17:43.264 JST tid=0x1] Initialization failed for Region /exampleRegionX com.gemstone.gemfire.ToDataException: toData failed on DataSerializer with id=0 for class class java.util.HashMap at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer.writeUserObject(InternalDataSerializer.java:1482) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer.writeWellKnownObject(InternalDataSerializer.java:1411) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer.basicWriteObject(InternalDataSerializer.java:2203) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.DataSerializer.writeObject(DataSerializer.java:3179) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.util.BlobHelper.serializeTo(BlobHelper.java:65) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.cache.AbstractRegionEntry.fillInValue(AbstractRegionEntry.java:342) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.chunkEntries(InitialImageOperation.java:1959) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.process(InitialImageOperation.java:1741) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:386) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:457) at Remote Member '192.168.1.10(27972):10344' in java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at Remote Member '192.168.1.10(27972):10344' in java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:692) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionManager$5$1.run(DistributionManager.java:1000) at Remote Member '192.168.1.10(27972):10344' in java.lang.Thread.run(Thread.java:745) at com.gemstone.gemfire.distributed.internal.ReplyException.handleAsUnexpected(ReplyException.java:75) at com.gemstone.gemfire.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:525) at com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1421) at com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1209) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2983) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2880) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2869) at com.gemstone.gemfire.cache.RegionFactory.create(RegionFactory.java:841) at com.customer.framework.cache.impl.gemfire.CacheServerCacheManager.afterConnect(CacheServerCacheManager.java:147) at com.customer.framework.cache.impl.gemfire.GemFireCacheManager.(GemFireCacheManager.java:118) at com.customer.framework.cache.impl.gemfire.CacheServerCacheManager.(CacheServerCacheManager.java:95) at com.customer.framework.cache.impl.gemfire.GemFireCacheManager.doInit(GemFireCacheManager.java:73) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.customer.framework.cache.CacheManager.init(CacheManager.java:83) at com.customer.framework.process.Server.start(Server.java:139) at com.customer.framework.process.Server.execute(Server.java:104) at com.customer.framework.process.CacheServer.main(CacheServer.java:31) Caused by: java.util.ConcurrentModificationException at Remote Member '192.168.1.10(27972):10344' in java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at Remote Member '192.168.1.10(27972):10344' in java.util.HashMap$EntryIterator.next(HashMap.java:1463) at Remote Member '192.168.1.10(27972):10344' in java.util.HashMap$EntryIterator.next(HashMap.java:1461) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.DataSerializer.writeHashMap(DataSerializer.java:2603) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer$32.toData(InternalDataSerializer.java:508) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer.writeUserObject(InternalDataSerializer.java:1451) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer.writeWellKnownObject(InternalDataSerializer.java:1411) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.InternalDataSerializer.basicWriteObject(InternalDataSerializer.java:2203) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.DataSerializer.writeObject(DataSerializer.java:3179) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.util.BlobHelper.serializeTo(BlobHelper.java:65) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.cache.AbstractRegionEntry.fillInValue(AbstractRegionEntry.java:342) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.chunkEntries(InitialImageOperation.java:1959) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.process(InitialImageOperation.java:1741) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:386) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:457) at Remote Member '192.168.1.10(27972):10344' in java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at Remote Member '192.168.1.10(27972):10344' in java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:692) at Remote Member '192.168.1.10(27972):10344' in com.gemstone.gemfire.distributed.internal.DistributionManager$5$1.run(DistributionManager.java:1000) at Remote Member '192.168.1.10(27972):10344' in java.lang.Thread.run(Thread.java:745) [info 2016/08/16 09:17:43.358 JST tid=0xe] VM is exiting - shutting down distributed system
"java.util.ConcurrentModificationException" is a common exception when working with java collection classes such as the Hashmap class. Generally, the ConcurrentModificationException can be thrown in case of multiple threads as well as a single thread in the Java programming environment such as, when a Collection is changed by one thread while another thread is traversing over it using iterator then iterator.next
.
In the case of the above stack, the failed node was trying to get an initial image (GII) from Remote Member '192.168.1.10(27972):10344
and it threw java.util.ConcurrentModificationException from java.util.HashMap$HashIterator.nextNode when iterating the Hashmap containing the region entries object, whereas, the other thread was changing the Hashmap because of an add/put/destroy/invalidate operation.
The ConcurrentModificationException is an expected exception in the described situation. To avoid this exception and the related issues when starting cache servers, the following could be applied:
Enable the copy-on-read parameter:
Using cache.xml:
<cache copy-on-read="true">
Using the GemFire Java API:
Cache c = CacheFactory.getInstance(system) c.setCopyOnRead(true);
You can find more details in the GemFire User's Guide here.
Changing the cache servers start order can also resolve this issue.