ALERT: Some images may not load properly within the Knowledge Base Article. If you see a broken image, please right-click and select 'Open image in a new tab'. We apologize for this inconvenience.

Hazelcast - java.net.SocketTimeoutException - CONGW 10.1

book

Article ID: 237051

calendar_today

Updated On:

Products

CA API Gateway

Issue/Introduction

in Gateway logs and they are continuously flowing. 

enabled: false
external: false

************Log Start**********

2022-03-12T02:22:04.838+0000 WARNING 122    com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager: [XX.XX.XX.XXX]:8777 [gateway] [3.12.5] This node does not have a connection to Member [XX.XX.XX.XX]:8777 - 109a01dc-0b33-4303-b56c-2ceb7beeecad
2022-03-12T02:22:04.838+0000 WARNING 122    com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager: [XX.XX.XX.XXX]:8777 [gateway] [3.12.5] This node does not have a connection to Member [XX.XX.XX.XX]:8777 - a2a883b1-e133-49cb-a3a8-bf69b863b485
2022-03-12T02:22:06.075+0000 WARNING 124    com.hazelcast.nio.tcp.TcpIpConnectionErrorHandler: [XX.XX.XX.XXX]:8777 [gateway] [3.12.5] Removing connection to endpoint [10.67.21.16]:8777 Cause => java.net.SocketTimeoutException {null}, Error-Count: 55
2022-03-12T02:22:06.176+0000 WARNING 109    com.hazelcast.nio.tcp.TcpIpConnectionErrorHandler: [XX.XX.XX.XXX]:8777 [gateway] [3.12.5] Removing connection to endpoint [XX.XX.XX.XX]:8777 Cause => java.net.SocketTimeoutException {null}, Error-Count: 55

**************Log End************

Below are the hazelcast properties in values.yaml in the helm charts.

**********Values Start***************

hazelcast:
  # If you wish to connect to an existing Hazelcast instance set enabled to false
  # external to true, and uncomment and set url.
  enabled: false
  external: false
  # url: hazelcast.example.com:5701
  image:
    tag: "3.12.8"
  cluster:
    memberCount: 2
  mancenter:
    enabled: false
  hazelcast:
    yaml:
      hazelcast:
        network:
          join:
            multicast:
              enabled: false
            kubernetes:
              enabled: true
              service-name: ${serviceName}
              namespace: ${namespace}
              resolve-not-ready-addresses: true

**********Values End***************

Cause

In Kubernetes, the pods(gateway nodes) join and exit the Cluster dynamically with dynamically assigned IPs. The cluster_info table is still persisted with the old unused Gateway nodes (IPs).
This is causing each node to send requests to other nodes confirming cluster membership and causing these errors. The list would big based on the auto scaling as each we ramp up and down, new list of nodes(IPs) join the table and still persist until manually removed from the Gateway Dashboard or deleting the cluster_info table.

Environment

Release : 10.0

Component : API GATEWAY

CONTAINER GATEWAY 

Resolution

You would just scale to zero and cleared the cluster_info table.
You can also set this system property to clean up inactive nodes older than x (background task is hardcoded and runs every 24 hours)

com.l7tech.server.clusterStaleNodeCleanupTimeoutSeconds


depends on which version are using right now ? 

 

This is under config - line 141
  systemProperties: |-    # By default, FIPS module will block an RSA modulus from being used for encryption if it has been used for    # signing, or visa-versa. Set true to disable this default behaviour and remain backwards compatible.    com.safelogic.cryptocomply.rsa.allow_multi_use=true    # Specifies the type of Trust Store (JKS/PKCS12) provided by AdoptOpenJDK that is used by Gateway.    # Must be set correctly when Gateway is running in FIPS mode. If not specified it will default to PKCS12.    javax.net.ssl.trustStoreType=jks    # Additional properties go here