The purpose of this article is to recognize a specific exception you may find in the Gemfire logs related to the sending and receiving of buffers across the WAN, and what this exception means. Specifically, the following exception:
Unexpected IOException: java.io.IOException: Part length (-2,000,248,199) and number of parts (18,001) inconsistent ...
This issue has been corrected in Gemfire versions 8 and later.
If the batch size is sufficiently large, and the data size as well, it is possible that we attempt to send a message greater than 2GB, which negatively impacts the receiver with an exception when it attempts to process that larger than expected buffer.
[warning 2015/10/07 21:06:29.213 UTC gemfire-gfirep03-49002 tid=0xc3af9] Server connection from [identity(10.x.y.z(gemfire-gfirep08-49002:23053):51883,connection=13; port=18172]: Unexpected IOException: java.io.IOException: Part length ( -2,000,248,199 ) and number of parts ( 18,001 ) inconsistent at com.gemstone.gemfire.internal.cache.tier.sockets.Message.readPayloadFields(Message.java:793) at com.gemstone.gemfire.internal.cache.tier.sockets.Message.readHeaderAndPayload(Message.java:742) at com.gemstone.gemfire.internal.cache.tier.sockets.Message.read(Message.java:587) at com.gemstone.gemfire.internal.cache.tier.sockets.Message.recv(Message.java:1087) at com.gemstone.gemfire.internal.cache.tier.sockets.Message.recv(Message.java:1101) at com.gemstone.gemfire.internal.cache.tier.sockets.BaseCommand.readRequest(BaseCommand.java:996) at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:770) at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:952) at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:532) at java.lang.Thread.run(Thread.java:744)
This exception is shown below in the Symptoms section:
Nothing on the sender side keeps track of the total size of the message that will be sent, so if the events being placed into the buffer grow increasingly large, you may surpass the 2gb limit prior to the ultimate shipping of the buffer.
Then, on the receiving distributed system, you may see evidence of the issue via the above exception.
To workaround, this issue in the system, you can simply set the batch-size to a value that will guarantee, for your expected event size, that you will not go over the 2gb threshold for the WAN buffer to be shipped to the other distributed system.
Recovery
Once this issue has been experienced, recovery from the situation is difficult without shutting down involved nodes. We recommend working with one of our Global Support Team members to confirm the issue, and then it may become necessary to revoke some disk stores and get the systems back to full production state. When performing such actions, we do recommend the assistance of one of our Support Engineers, so please open a Critical ticket to expedite resolution of this issue.