Coordinator stops responding - OutOfMemoryError
search cancel

Coordinator stops responding - OutOfMemoryError

book

Article ID: 133637

calendar_today

Updated On:

Products

CA Application Test

Issue/Introduction

I am running into intermittent instances where CVS tests stop running.  Currently Daily over the last 3 days.  There has been no environmental changes.

I have traced this back to the fact that the Coordinator appears to have stopped responding.

The coordinator count in the Portal/Server Health window shows 0 coordinators.

The coordinator java process IS still running.

The coordinator log is showing OutOfMemoryErrors.

I will upload the coordinator logs.



Environment

Release : All supported DevTest versions

Component : CA Application Test

Cause

 SEVERE: Could not accept connection : java.net.SocketException: Connection reset

2019-06-14 06:37:56,818Z (02:37) [Event Sink Thread Pool Thread 5] INFO com.itko.lisa.stats.MetricControllerImpl - Error retrieving metric

java.lang.IllegalStateException: Could not put anything new on the event queue

2019-06-14 14:54:14,447Z (10:54) [amq dbwriter #735 for queue reporting_735 report USPS_DB 2019-06-14 05:52:43,724Z (01:52) [Event Sink Thread Pool Thread 2] INFO com.itko.lisa.stats.MetricControllerImpl - Error retrieving metric

Resolution

Add these two properties to the local.properties on the DevTest  Coordinator machine:

lisa.eventPool.maxQueueSize=131070

lisa.pathfinder.on=false

and Restart Registry , coordinator and Simulator to pick up the new properties.

The timeout issue is not really a bug, it is an indication that the system is overloaded.

 

When using connection pooling for load tests (multi-VUs), you may need to configure the lisa.jdbc.pool.maxPoolSize property not to run out of connections (starvation),

lisa.jdbc.pool.maxPoolSize=25 ( default is 10)

 Updating the lisa.eventPool.maxQueueSize will not fix the problem but will provide more resources to the system so that the timeout errors are delayed.

Also, Check that there is sufficient space in the Registry database and that there is no connection problem accessing this database.