After running for a 10 hour period I started to get error and overloaded systems in the coordinator.
22018-06-15 04:12:31,908Z (23:12) [Event Sink Thread Pool Thread 5] INFO com.itko.lisa.stats.MetricControllerImpl - Error retrieving metric
com.itko.lisa.test.EventDeliveryException: Timeout putting events on internal event handler queue. This usually indicates an overloaded system.
at com.itko.lisa.simulator.EventHandler.testEvent(EventHandler.java:244)
at com.itko.lisa.stats.MetricControllerImpl.fireTestEvent(MetricControllerImpl.java:580)
at com.itko.lisa.stats.MetricControllerImpl.eventReceipt(MetricControllerImpl.java:523)
at com.itko.util.EventThread.execEvents(ThreadedEventSink.java:79)
at com.itko.util.EventThread.run(ThreadedEventSink.java:59)
2018-06-15 04:12:31,909Z (23:12) [Event Sink Thread Pool Thread 5] INFO com.itko.lisa.stats.MetricControllerImpl - Error retrieving metric
java.lang.IllegalStateException: Could not put anything new on the event queue
at com.itko.lisa.simulator.EventHandler.testEvent(EventHandler.java:249)
at com.itko.lisa.stats.MetricControllerImpl.fireTestEvent(MetricControllerImpl.java:580)
at com.itko.lisa.stats.MetricControllerImpl.eventReceipt(MetricControllerImpl.java:523)
at com.itko.util.EventThread.execEvents(ThreadedEventSink.java:79)
at com.itko.util.EventThread.run(ThreadedEventSink.java:59)
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:380)
at com.itko.lisa.simulator.EventHandler.testEvent(EventHandler.java:242)
All supported DevTest releases.
Add these two properties to the local.properties on the DevTest Coordinator machine:
lisa.eventPool.maxQueueSize=131070
lisa.pathfinder.on=false
The Coordinator will have to be restarted to pick up the new properties.
The timeout issue is not really a bug, it is an indication that the system is overloaded. Updating the lisa.eventPool.maxQueueSize will not fix the problem but will provide more resources to the system so that the timeout errors are delayed.