So we recently had some proxy devices added in PC and seem to work. In checking the monitored device details we are seeing a management lost message. Need to determine what the issue is and the inconsistent data
Reporting data for a number of devices from the same vendor is full of gaps, shows up sporadically in reports and also shows bad Reachability/Availability data and device status values. The devices are all using the same Monitoring Profile configurations. When investigating we see many response with timed out messages from the discovery cycles, as well as from detailed poll logging. Those negative responses back up the spotty data.
These are already discovered devices that were upgraded from SNMPv2c to SNMPv3. The devices shared a common, duplicated, engineID value. SNMPv3 RFCs state it is required that each agent configured for SNMPv3 use a unique engineID value. The Data Aggregator enforces this and would not allow discovery of new devices that have the same SNMPv3 engineID. We'd discover one of the devices with the shared value but reject any others that have the same config. In this instance since it's existing SNMPv2c devices updated to SNMPv3 we don't see the issue as a Discovery failure. Instead it shows itself in polling anomalies.
As a result each poll cycle is a gamble which of the devices on a given DC with a common SNMPv3 engineID wins the polling cycle. That one that wins that poll cycle is the one that gets data polled. This flaps back and forth, only rarely allowing a single device to have enough successful poll cycles to actually generate deltas and data for DB insertion.
DX NetOps Performance Management Data Aggregator polling problems and spotty report data from SNMPv3 managed devices.
All supported DX NetOps Performance Management releases
Devices are configured in a way that duplicates the engine ID between devices.
Every SNMPv3 device must have a unique engine ID not shared by other devices.
This is described in RFC 5343 found here: https://tools.ietf.org/html/rfc5343
To validate is this is the cause we can use information from the DC snmpSession data. To access it we'll use different methods depending on the Performance Management release involved.
In the output from that URL search for the entries that show known SNMPv3 engine ID values and IP Addresses that have reported them. Where there is more than one IP Address against a given SNMPv3 engine ID, we have multiple devices reporting the same engine ID.
Sample section showing three devices reporting the same engine ID.
EngineBoots: 8, EngineTime: 2999050
MULTIPLE ADDRESSES:[188.8.131.52/161, 184.108.40.206/161, 220.127.116.11/161]
Note that the cache is not cleared until the DC is restarted so if the engineID has changed on the device you may see still see the older entry.
Reconfigure affected devices to ensure they have unique SNMPv3 engine ID values.