OverlappingFileLockException on start up after removal of non-empty region
search cancel

OverlappingFileLockException on start up after removal of non-empty region

book

Article ID: 294274

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

Symptoms:

When removing a non-empty region (whose keys are custom classes) directly from the `cache.xml` file, and then starting up the server as usual either via the Java API or using GFSH, you will get a `java.nio.channels.OverlappingFileLockException.`

The complete stack trace is similar to the following:

Caused by: com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: DEFAULT: Could not lock "./DRLK_IFDEFAULT.lk". Other JVMs might have created diskstore with same name using the same directory., caused by java.nio.channels.OverlappingFileLockException
 at com.gemstone.gemfire.internal.cache.DiskStoreImpl.createLockFile(DiskStoreImpl.java:1869)
 at com.gemstone.gemfire.internal.cache.DiskStoreImpl.loadFiles(DiskStoreImpl.java:1959)
 at com.gemstone.gemfire.internal.cache.DiskStoreImpl.<init>(DiskStoreImpl.java:473)
 at com.gemstone.gemfire.internal.cache.DiskStoreImpl.<init>(DiskStoreImpl.java:386)
 at com.gemstone.gemfire.internal.cache.DiskStoreImpl.<init>(DiskStoreImpl.java:381)
 at com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:131)
 at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.getOrCreateDefaultDiskStore(GemFireCacheImpl.java:2317)
 at com.gemstone.gemfire.internal.cache.LocalRegion.findDiskStore(LocalRegion.java:7991)
 at com.gemstone.gemfire.internal.cache.LocalRegion.<init>(LocalRegion.java:620)
 at com.gemstone.gemfire.internal.cache.DistributedRegion.<init>(DistributedRegion.java:193)
 at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2963)
 at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2891)
 at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2880)
 at com.gemstone.gemfire.cache.RegionFactory.create(RegionFactory.java:841)

Environment


Cause

Several currently uncaught exceptions during disk-store recovery cause the recovery to fail without terminating the GemFire process. The internal lock file used by GemFire is kept open, causing future attempts by the same JVM also to fail, exiting the process by throwing the mentioned lock exception instead of a more descriptive one.

The key to assure this is that the root cause is to find the initial recovery attempt and the cause for the first failure, as an example:

Caused by: com.gemstone.gemfire.pdx.PdxSerializationException: Could not create an instance of a class io.pivotal.support.model.CustomKey
at com.gemstone.gemfire.pdx.internal.PdxType.getPdxClass(PdxType.java:227)
at com.gemstone.gemfire.pdx.internal.PdxReaderImpl.basicGetObject(PdxReaderImpl.java:676)
at com.gemstone.gemfire.pdx.internal.PdxReaderImpl.getObject(PdxReaderImpl.java:672)
at com.gemstone.gemfire.internal.InternalDataSerializer.readPdxSerializable(InternalDataSerializer.java:3186)
at com.gemstone.gemfire.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2984)
at com.gemstone.gemfire.DataSerializer.readObject(DataSerializer.java:3212)
at com.gemstone.gemfire.internal.util.BlobHelper.deserializeBlob(BlobHelper.java:101)
at com.gemstone.gemfire.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1554)
at com.gemstone.gemfire.internal.cache.Oplog.deserializeKey(Oplog.java:7774)
at com.gemstone.gemfire.internal.cache.Oplog.readKrf(Oplog.java:1810)
at com.gemstone.gemfire.internal.cache.Oplog.recoverCrf(Oplog.java:2267)
at com.gemstone.gemfire.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:459)
at com.gemstone.gemfire.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:367)
at com.gemstone.gemfire.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2065)
at com.gemstone.gemfire.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2052)
at com.gemstone.gemfire.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2057)
at com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:135)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.getOrCreateDefaultDiskStore(GemFireCacheImpl.java:2317)
at com.gemstone.gemfire.internal.cache.LocalRegion.findDiskStore(LocalRegion.java:7991)
at com.gemstone.gemfire.internal.cache.LocalRegion.<init>(LocalRegion.java:620)
at com.gemstone.gemfire.internal.cache.DistributedRegion.<init>(DistributedRegion.java:193)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2963)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2891)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2880)
at com.gemstone.gemfire.cache.RegionFactory.create(RegionFactory.java:841)
... 103 more
Caused by: java.lang.ClassNotFoundException: io.pivotal.support.model.CustomKey
at com.gemstone.gemfire.internal.ClassPathLoader.forName(ClassPathLoader.java:422)
at com.gemstone.gemfire.internal.InternalDataSerializer.getCachedClass(InternalDataSerializer.java:4058)
at com.gemstone.gemfire.pdx.internal.PdxType.getPdxClass(PdxType.java:225)

RCA

What happens when you simply remove the region declaration from the `cache.xml` is that the disk storage is still recovered and the keys from this region are still recovered to an internal, temporary map; meaning that the custom key class needs to be on the member's classpath even if the region is not used anymore. You either need to keep the class on the classpath of the server or destroy the region.

This map, as a side note, ends up never being used because the region won't be created at all, resulting in a waste of memory and resources. 

Resolution

To remove the region from the disk store, you have two options:

  1. While the member is online and the region is still alive, invoke`Region.destroyRegion().' This will get rid of all its data from the disk-store. If you are using cluster config, you can do this with the gfsh destroy region command, which will also remove it from the cluster config.
  2. While the member is offline and with the region already removed from the`cache.xml` file, you can delete the region from the disk store by executing the gfsh alter disk-store --remove command.