Member Failure is Detected with the Message, "Reason=FD_SOCK tcp/ip Health Check"
search cancel

Member Failure is Detected with the Message, "Reason=FD_SOCK tcp/ip Health Check"

book

Article ID: 294120

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

Symptoms:

The member failure detection mechanism of GemFire 8.2.x or former is based on FD (UDP) and FD_SOCK (TCP) protocols derived from JGroups. However, you may see a slightly different failure detection behavior with GemFire from the original JGroups FD_SOCK mechanism.

Member failure is detected with the following message, found in the locator or server logs.

Log Message:

[info 2017/05/22 02:21:02.874 GMT+09:00 locator2 <UDP ucast receiver> tid=0x2b] Received Suspect notification
for member(s) [192.168.100.3(server3:25509)<v7>:28042] from 192.168.100.2(server2:27375)<v5>:28157. 
Reason=FD_SOCK tcp/ip health check

Environment


Cause

In this case, the member failure is detected by "RandomPingTask," which an enhancement to the FD_SOCK implementation done by Pivotal. In addition to the original FD_SOCK mechanism, each GemFire member periodically tries to establish a TCP connection with the other members randomly. If it fails to establish the TCP connection, then the target member is suspected as failed and you will see the log message with reason "FD_SOCK tcp/ip health check."

Resolution

  • If you set disable-tcp=true in gemfire.properties, this "RandomPingTask" failure detection mechanism is disabled.
  • The member failure detection mechanism has been revised in GemFire 9.0, so this article is not applicable to GemFire 9.0 or later.