High cpu usage in Service Registry and "SocketTimeoutException" found in logs
search cancel

High cpu usage in Service Registry and "SocketTimeoutException" found in logs

book

Article ID: 297161

calendar_today

Updated On:

Products

Support Only for Spring

Issue/Introduction

Symptoms for this issue is that high usage of CPU (it may exceeds 100%) and memory for Service Registry. 

Service Registry backing app shows the following in logs.
 
2024-02-15T18:10:51.24+0300 [APP/PROC/WEB/1] OUT 2024-02-15 15:10:51.243 ERROR 14 --- [fulanito.com.tr-17] c.n.e.cluster.ReplicationTaskProcessor   : It seems to be a socket read timeout exception, it will retry later. if it continues to happen and some eureka node occupied all the cpu time, you should set property 'eureka.server.peer-node-read-timeout-ms' to a bigger value
2024-02-15T18:10:51.24+0300 [APP/PROC/WEB/1] OUT com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out


This can happen when a big number of applications (more than 500) are bound to a Service Registry service where peer replication  is configured; or the "count" parameter is bigger than one (this is, there are more one than one instance for the backing app) which configures peer replication internally. 

The above has been proven to be a high-load operation that consumes significant CPU resources, resulting in a bottleneck problem.


Environment

Product Version: 2.1

Resolution

It is not recommended to have more than 200 client applications bound to a single Service Registry service..

So users facing this issue need to create new Service Registry services and redirect the applications to the new instance gradually.

In oder words, users need group the client applications where the app count doesn't go over 200 and configure each group to use one of those Service Registry services. It's important that each group must be isolated so that apps in one group don't communicate with the apps in other groups.

This solution will provide high availability, reduce load, and improve the user's architecture.

 

 

Note* This issue is fixed in 3.1 and above.