Data Aggregator loses connection to Data Collector in CA Performance Management (CAPM)

book

Article ID: 35398

calendar_today

Updated On:

Products

CA Infrastructure Management CA Infrastructure Management CA Performance Management - Usage and Administration

Issue/Introduction

Data Aggregator loses connection to the Data Collector.

On the Data Aggregator, the data collector becomes unknown in the GUI. When checking the Data Aggregator karaf logs you see errors similar to the following: 

WARN | atTimer-thread-3 | 2015-09-11 11:43:02,796 | DCHeartBeatLog | r.controller.DCMHeartbeatManager 131 | ore.collector.interfaces | | No response has been received from DC 623 in timeframe 58913 (ms):

ERROR | atTimer-thread-3 | 2015-09-11 11:43:02,797 | DCHeartBeatLog | impl.DCMContactStatusManagerImpl 116 | ager.core.collector.impl | | Lost contact to DC txanunxlipcp006.goldlnk.rootlnka.net:77156654-7599-4fc4-8f57-51e7f48a3aa1. State changed from RUNNING to CONTACT_LOST. The last heartbeat was received 58913 ms ago

Cause

While working multiple issues of this type it was found that standard connectivity tests did not show any problems between the systems. However when tests were run using larger packet sizes (4096 or 40960), there was significant packet loss found between the systems, showing a network problem. 

Environment

CAPM 3.x

Resolution

To test connectivity between systems using a larger packet size you can use ping with the -s option. Just examine the summary of the output after terminating the ping command for the % packet loss.

Example of command(s) to be run on the Data Aggregator:

ping -s 4096  <Data_Collector_system>

ping -s 40960 <Data_Collector_system>