Data Aggregator loses connection to the Data Collector.
On the Data Aggregator, the data collector becomes unknown in the GUI. When checking the Data Aggregator karaf logs you see errors similar to the following:
WARN | atTimer-thread-3 | 2015-09-11 11:43:02,796 | DCHeartBeatLog | r.controller.DCMHeartbeatManager 131 | ore.collector.interfaces | | No response has been received from DC 623 in timeframe 58913 (ms):
ERROR | atTimer-thread-3 | 2015-09-11 11:43:02,797 | DCHeartBeatLog | impl.DCMContactStatusManagerImpl 116 | ager.core.collector.impl | | Lost contact to DC txanunxlipcp006.goldlnk.rootlnka.net:77156654-7599-4fc4-8f57-51e7f48a3aa1. State changed from RUNNING to CONTACT_LOST. The last heartbeat was received 58913 ms ago
While working multiple issues of this type it was found that standard connectivity tests did not show any problems between the systems. However when tests were run using larger packet sizes (4096 or 40960), there was significant packet loss found between the systems, showing a network problem.
To test connectivity between systems using a larger packet size you can use ping with the -s option. Just examine the summary of the output after terminating the ping command for the % packet loss.
Example of command(s) to be run on the Data Aggregator:
ping -s 4096 <Data_Collector_system>
ping -s 40960 <Data_Collector_system>