The Data Repository and Data Aggregator were down for around 36 hours.
After they were up and stable again, the two Data Collectors in the environment both automatically connected back to the Data Aggregator.
No Data Collector service restarts were needed.
One Data Collector shows a data gap for the outage period.
One Data Collector does not show a data gap for the outage period.
The environment was recently upgraded from older r3.2 release to a newer r3.7.9 release.
All supported Performance Management releases
The Data Collector with the data gap had an upgrade failure resulting in the old r3.2 release still running.
The Data Collectors are capable of running older releases post Data Aggregator upgrade for a short time. It's intended to allow continuous data polling to minimize data gaps when the Data Collector does get upgraded.
The function is intended to be used for as short a time as possible. It allows the Data Collector to continue polling and sending data back to the Data Aggregator. It does not permit any changes or new items. Discoveries or item changes/updates via Metric Family Updates or Change Detection would not work.
In this case with no logging available that indicates a particular problem, it's likely that with the Data Collector running the old release for so long, that after 36 hours it failed to pass its data back to the Data Aggregator or had already lost it, having none to send back.
There is no way to recover the last data.
Ensure Data Collectors succeed when upgraded. If upgrade failures are seen, resolve them as soon as possible to minimize data gaps.