snmpcollector - Disk & Hardware Status metrics not collected on Dell PowerEdge devices after upgrade to SNMPC v4.12 (from 4.03)

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

With snmpcollector version 4.03, Disk and Host->Hardware Status metrics are being collected for Dell PowerEdge R630 machines, but those sections/metrics are not being collected when using snmpcollector version 4.12.

Requesting investigation on what if anything has changed to create this gap/difference between snmpcollector v4.03 versus 4.12 and/or the associated DCD version.

Environment

Release: 20.4
snmpcollector v4.03, 4.12
hub 9.38/9.39
Dell PowerEdge devices, e.g., R650

Cause

Differences within snmpcollector 'CustomVendorCertifications' folder contents on different hubs

Resolution

We found that the old hubs are collecting data from the custom vendor certificates but these files were found to be missing on the new hubs, e.g., hubxx, and hubyy. Overall, we found that the problems with some metrics not being displayed in the navigation tree, and not being enabled/monitored, had nothing to do with the snmpcollector versions (4.03 vs 4.12). In most cases, the templates were being applied but some templates had some of the metrics enabled/disabled as well so this was worth checking.

Certs are used to load the device and its MIBs, and templates are used to enable monitoring if that monitor exists. However, due to missing (custom) certs, metrics were missing in the device tree and hence were not in the template.

Essentially, we found that some custom-made vendor certificates were created via snmpcollector ‘self-certification’ of devices/DCD, but they were not present in the 'CustomVendorCertifications' folder on the newly created hubs.

From this point forward, if and when snmpcollector monitoring is moved from one hub to another via the custom ‘move-to-hub#’ package(s), any templates/CustomMadeCertificates should ALSO be copied over to the target hub. The snmpcollector CustomVendorCertifications folder may or may not contain the same contents so this is a requirement. The best practice is to copy snmpcollector templates / bulk_config to any newly created hub / updated hub, since the bulk_config folder should contain the current templates and templateDefinition.json.

During this process, the customer also decided to delete the metadata from the S_QOS_DATA tables for the given hub and then we selected ‘Force Component Rediscovery’ in the snmpcollector probe v4.12 on hubxx.

We restarted the snmpcollector probe and the test results were very good as the expected metrics were displayed as expected/with the current configuration, and included the correct number of metrics (11):

CPU 1.3.6.1.4.1.674.10892.1.1100.30.1.5
Electric Current 1.3.6.1.4.1.674.10892.1.600.30.1.5
Disk 1.3.6.1.4.1.674.10893.1.20.130.4.1.4
Disk Volume (Virtual Disk) 1.3.6.1.4.1.674.10893.1.20.140.1.1.4
Hardware Status (Chassis) 1.3.6.1.4.1.674.10892.1.200.10.1.41
Memory 1.3.6.1.4.1.674.10892.1.1100.50.1.5
Temperature 1.3.6.1.4.1.674.10892.1.700.20.1.6
Fan Status (CoolingDeviceStatus) 1.3.6.1.4.1.674.10892.1.700.12.1.5.1
Fan Speed (CoolingDeviceReading) 1.3.6.1.4.1.674.10892.1.700.12.1.6
Session (Intrusion Detection) 1.3.6.1.4.1.674.10892.1.300.70.1.5
Power Supply 1.3.6.1.4.1.674.10892.1.600.12.1.5

For OIDs 8 through 11 listed above, Blade Servers (PowerEdge M Class) have only 7 MetricFamily
graphs - they won’t have these OIDs, but 'R' class models will, e.g., R650.

We then manually checked all of the enabled metrics and configuration and it was configured as expected / according to the template.
After we copied all of the existing custom vendor certificates over to the new hubs, all of the expected metrics (11) were displayed under the snmpcollector profiles.
Note that a 'Field services' resource performed the device 'self-certification' process in the past but we were previously unaware of this fact.
In the meantime, requested a change to the poll interval for all the previously reported non-working hubs from 24 hours to 5 min (300 seconds) so that QOS data collection can happen on all the non-working hubs and we can establish stable data in a shorter time period.

Additional Information

The last steps taken for one snmpc profile listed below which was not showing all 11 metrics->

Delete device-> <device_name> from the snmpcollector probe configuration.
Then readded it and discovered it again via 'Force Component Rediscovery.'
Waited for 2 polling intervals to pass.
Then checked the QOS metrics status.
All reported issues in the case were resolved.