Recently upgraded our DEV environment and the data collector keeps disconnecting and reconnecting.
The Data Collector services require restart once a week to prevent data gaps or the DC from crashing.
One collector goes into a disconnect/connect continuously state about once a week. I have to restart the collector each time for it to stop the disconnect/reconnect. Memory was low, disk space was fine, not sure why memory would be low however as we have not added any large amounts of device etc.
Data Collector Health Dashboard from Portal System Health options shows the DC's Heap usage grow in a steady ramp up post restart. Once it reaches the allocated memory it disconnects requiring a dcmd service restart to function again.
All supported DX NetOps Performance Management releases
Data collector over subscribed due to excessive QoS items and their polling load.
The QOS metric family had been deployed which caused a large increase in monitored items. This caused memory to increase and the collector went down.
Once down devices were removed from association to the QoS Monitoring Profile and it's related Metric Families, reducing polled items from QoS allowed the DC to function properly.
If additional QoS monitoring it required ensure it is applied only to devices that require monitoring.
Ensure devices that create large QoS item counts have filters in place to only create and poll those items that required polling.
In some cases a new DC may be need to manage the additional load from QoS polling. See the DX NetOps Sizing Tool for help with the required resource determinations.