VMWare Tanzu GemFire Servers do not come up while retrieving the data from the disk stores
search cancel

VMWare Tanzu GemFire Servers do not come up while retrieving the data from the disk stores

book

Article ID: 294329

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

While starting VMWare Tanzu GemFire Servers with a persistent region, the data is retrieved from disk stores to recreate the member’s persistent region, the server(s) could face one or both the following issues:
  • Slow startup of servers 
  • Servers fail to come up successfully


Symptoms:

Symptoms are the following 
  • Slow startup of VMWare GemFire Servers 
  • VMWare GemFire Servers fail to come up successfully

Environment

Product Version: 9.1

Cause


If we see the stack which is similar to "state Executor Group Monitored metric Thread Stack: java.io.RandomAccessFile.seek0(Native Method) " or " Thread Stack: sun.nio.ch.FileDispatcherImpl.write0(Native Method)", there could be chances that the persistent disk is slow in responding.

Resolution

Please check the disk store if they are slow

Checklist:
In order to troubleshoot the disk related issue, we can check for the following stuck thread messages in the VMWare Tanzu GemFire Server logs
 
[warn 2021/04/22 14:20:25.392 CDT <ThreadsMonitor> tid=0x15] Thread <1497632> that was executed at <22 Apr 2021 14:10:52 CDT> has been stuck for <572.901 seconds> and number of thread monitor iteration <10>
Thread Name <Pooled High Priority Message Processor 3168>
Thread state <RUNNABLE>
Executor Group <PooledExecutorWithDMStats>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread Stack:
sun.nio.ch.FileDispatcherImpl.write0(Native Method)
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
sun.nio.ch.IOUtil.write(IOUtil.java:51)
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470)
org.apache.geode.internal.tcp.Connection.nioWriteFully(Connection.java:3291)

Or
[warn 2020/11/25 04:49:09.455 EST tid=0x1a] Thread (0x74) that was executed at has been stuck for and number of thread monitor iteration Thread Name state Executor Group Monitored metric Thread Stack: java.io.RandomAccessFile.seek0(Native Method) java.io.RandomAccessFile.seek(RandomAccessFile.java:557) org.apache.geode.internal.cache.persistence.UninterruptibleRandomAccessFile.seek(UninterruptibleRandomAccessFile.java:77) org.apache.geode.internal.cache.Oplog.attemptGet(Oplog.java:5347) org.apache.geode.internal.cache.Oplog.basicGet(Oplog.java:5419) org.apache.geode.internal.cache.Oplog.getBytesAndBits(Oplog.java:1247) org.apache.geode.internal.cache.DiskStoreImpl.getBytesAndBitsWithoutLock(DiskStoreImpl.java:957) org.apache.geode.internal.cache.DiskStoreImpl.getRaw(DiskStoreImpl.java:877) org.apache.geode.internal.cache.AbstractDiskRegion.getRaw(AbstractDiskRegion.java:1041) org.apache.geode.internal.cache.entries.DiskEntry$Helper.getValueFromDisk(DiskEntry.java:1191) org.apache.geode.internal.cache.entries.DiskEntry$Helper.readValueFromDisk(DiskEntry.java:1255) org.apache.geode.internal.cache.entries.DiskEntry$Helper.faultInValue(DiskEntry.java:1114) org.apache.geode.internal.cache.entries.DiskEntry$Helper.faultInValue(DiskEntry.java:1067) org.apache.geode.internal.cache.entries.AbstractOplogDiskRegionEntry.getValue(AbstractOplogDiskRegionEntry.java:80) org.apache.geode.internal.cache.LocalRegion.getDeserialized(LocalRegion.java:1251) org.apache.geode.internal.cache.NonTXEntry.getValue(NonTXEntry.java:91) org.apache.geode.internal.cache.LocalDataSet$LocalEntriesSet$LocalEntriesSetIterator.moveNext(LocalDataSet.java:781) org.apache.geode.internal.cache.LocalDataSet$LocalEntriesSet$LocalEntriesSetIterator.next(LocalDataSet.java:742) org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:834) org.apache.geode.cache.query.internal.CompiledSelect.doIterationEvaluate(CompiledSelect.java:701) org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:545) org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53) org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:430) org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:246) org.apache.geode.internal.cache.PRQueryProcessor.executeSequentially(PRQueryProcessor.java:212) org.apache.geode.internal.cache.PRQueryProcessor.executeQuery(PRQueryProcessor.java:122) org.apache.geode.internal.cache.PartitionedRegionQueryEvaluator.executeQueryOnLocalNode(PartitionedRegionQueryEvaluator.java:962) org.apache.geode.internal.cache.PartitionedRegionQueryEvaluator.executeQueryOnRemoteAndLocalNodes(PartitionedRegionQueryEvaluator.java:378) org.apache.geode.internal.cache.PartitionedRegionQueryEvaluator.queryBuckets(PartitionedRegionQueryEvaluator.java:495) org.apache.geode.internal.cache.PartitionedRegion.doExecuteQuery(PartitionedRegion.java:2054) org.apache.geode.internal.cache.PartitionedRegion.executeQuery(PartitionedRegion.java:1981) org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:831) com.adc.ai.cache.server.function.ClearLocalRegionAdHocFunction.query(ClearLocalRegionAdHocFunction.java:55) com.adc.ai.cache.server.function.ClearLocalRegionAdHocFunction.execute(ClearLocalRegionAdHocFunction.java:45) org.apache.geode.internal.cache.PartitionedRegionDataStore.executeOnDataStore(PartitionedRegionDataStore.java:2993) org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.operateOnPartitionedRegion(PartitionedRegionFunctionStreamingMessage.java:97) org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:330) org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:365) org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:429) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)