GemFire cluster hanging when "conserve-sockets=true"
search cancel

GemFire cluster hanging when "conserve-sockets=true"

book

Article ID: 294233

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

Symptoms:

GemFire cluster may be in a hanging state when conserve-sockets=true is set up like this with the cluster in a high load situation. When it is hanging, you may see the following symptoms:

A. Thread stack

"ServerConnection on port 12480 Thread 929" tid=0x8fe (in native)
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
- locked java.lang.Object@241d89aa
at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3277)
- locked java.lang.Object@7d36d2bc
at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
at com.gemstone.gemfire.internal.tcp.MsgStreamer.realFlush(MsgStreamer.java:317)
at com.gemstone.gemfire.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:245)
at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:458)
at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:310)
at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.send(DirectChannel.java:696)
at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.directChannelSend(JGroupMembershipManager.java:2844)
at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.send(JGroupMembershipManager.java:3078)
at com.gemstone.gemfire.distributed.internal.DistributionChannel.send(DistributionChannel.java:79)
at com.gemstone.gemfire.distributed.internal.DistributionManager.sendOutgoing(DistributionManager.java:3780)
at com.gemstone.gemfire.distributed.internal.DistributionManager.sendMessage(DistributionManager.java:3821)
at com.gemstone.gemfire.distributed.internal.DistributionManager.putOutgoing(DistributionManager.java:1957)
at com.gemstone.gemfire.internal.cache.partitioned.DestroyMessage.send(DestroyMessage.java:213)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyRemotely(PartitionedRegion.java:5734)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyInBucket(PartitionedRegion.java:5552)
at com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.destroyExistingEntry(PartitionedRegionDataView.java:45)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.basicDestroy(PartitionedRegion.java:5419)
at com.gemstone.gemfire.internal.cache.LocalRegion.validatedDestroy(LocalRegion.java:1143)
at com.gemstone.gemfire.internal.cache.LocalRegion.destroy(LocalRegion.java:1130)
at com.gemstone.gemfire.internal.cache.AbstractRegion.destroy(AbstractRegion.java:315)
at com.gemstone.gemfire.internal.cache.LocalRegion.remove(LocalRegion.java:9362)
......
"ServerConnection on port 12480 Thread 875" tid=0x8c5 owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
java.lang.Thread.State: BLOCKED
at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
- blocked on java.lang.Object@7d36d2bc
at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
at com.gemstone.gemfire.internal.tcp.MsgStreamer.realFlush(MsgStreamer.java:317)
at com.gemstone.gemfire.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:245)
at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:458)
at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:310)
at com.gemstone.gemfire.distributed.internal.direct.DirectChannel.send(DirectChannel.java:696)
at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.directChannelSend(JGroupMembershipManager.java:2844)
at com.gemstone.gemfire.distributed.internal.membership.jgroup.JGroupMembershipManager.send(JGroupMembershipManager.java:3078)
at com.gemstone.gemfire.distributed.internal.DistributionChannel.send(DistributionChannel.java:79)
at com.gemstone.gemfire.distributed.internal.DistributionManager.sendOutgoing(DistributionManager.java:3780)
at com.gemstone.gemfire.distributed.internal.DistributionManager.sendMessage(DistributionManager.java:3821)
at com.gemstone.gemfire.distributed.internal.DistributionManager.putOutgoing(DistributionManager.java:1957)
at com.gemstone.gemfire.internal.cache.partitioned.DestroyMessage.send(DestroyMessage.java:213)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyRemotely(PartitionedRegion.java:5734)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.destroyInBucket(PartitionedRegion.java:5552)
at com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.destroyExistingEntry(PartitionedRegionDataView.java:45)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.basicDestroy(PartitionedRegion.java:5419)
at com.gemstone.gemfire.internal.cache.LocalRegion.validatedDestroy(LocalRegion.java:1143)
at com.gemstone.gemfire.internal.cache.LocalRegion.destroy(LocalRegion.java:1130)
at com.gemstone.gemfire.internal.cache.AbstractRegion.destroy(AbstractRegion.java:315)
at com.gemstone.gemfire.internal.cache.LocalRegion.remove(LocalRegion.java:9362)
......
"ServerConnection on port 12480 Thread 873" tid=0x8c3 owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
java.lang.Thread.State: BLOCKED
at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
- blocked on java.lang.Object@7d36d2bc
at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
......
"ServerConnection on port 12480 Thread 1394" tid=0xaee owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
java.lang.Thread.State: BLOCKED
at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
- blocked on java.lang.Object@7d36d2bc
at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
......
"PartitionedRegion Message Processor105" tid=0x768 owned by "ServerConnection on port 12480 Thread 929" tid=0x8fe
java.lang.Thread.State: BLOCKED
at com.gemstone.gemfire.internal.tcp.Connection.nioWriteFully(Connection.java:3264)
- blocked on java.lang.Object@7d36d2bc
at com.gemstone.gemfire.internal.tcp.Connection.sendPreserialized(Connection.java:2511)
...... 

B. The cacheserver log file contains many messages like the ones below:

[warn 2017/03/15 19:38:40.072 CST tid=0x4f4] 15 seconds have elapsed while waiting for replies: <PutMessage$PutResponse 2569 waiting for 1 replies from [......]

[warn 2017/03/15 19:38:40.072 CST tid=0x4bc] 15 seconds have elapsed while waiting for replies: <GetMessage$GetResponse 2571 waiting for 1 replies from [......]

[warn 2017/03/15 19:38:41.564 CST tid=0x5e8] 15 seconds have elapsed while waiting for replies: <com.gemstone.gemfire.internal.cache.PartitionedRegionQueryEvaluator$StreamingQueryPartitionResponse 2588 waiting for 1 replies from [......] 

Environment


Cause

From the above thread dump and logging information, we can see that the GemFire cluster is stuck at a synchronization point in Connection.nioWriteFully between peer and peer. This blocking is caused by sharing sockets in the application threads when conserve-sockets=true.

Resolution

Changing the default setting conserve-sockets=true to conserve-sockets=false can prevent this from happening.