cdm metrics not getting collected properly for some servers

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We have servers where profiles are active but there are no cdm CPU and/or Memory metrics. Also robot and source display different values in the S_QOS_DATA table.

Environment

CDM 8.03 or higher
UIM 23.4 CU3 or higher
MCS in use

Cause

Some cpu and memory metrics use the target as the hostname
UIM cannot ignore/exclude host entry in the /etc/hosts file

Resolution

MCS or Non-MCS
CPU and Memory QOS are missing for hostname_example_02.
cdm probe log shows the QOS data is being collected and sent.
In one case the issue resolved automatically and then it happens again on another server randomly.
One system recovered on its own, now no QOS is missing for that server.
02 belongs to a group
03 belongs to group->None
03 doesn't show up in the Inventory.
hostname_example_02 and hostname_example_03 have the same IP address in the database.
The QOS data is being saved in 2 places in the database but memory is not being saved for host 03.
We examined if the other reported machines that are not collecting QOS are also not resolvable via DNS.
- nslookup command on two hosts with similar names but ending in '02 and '03 showed the same exact IP. Duplicates.

Note: UIM SLM page has an option to Show statistics which allows you to see WHERE the QOS data is currently being saved.

As seen in the OC SLM, the QOS data was being saved across both source hostnames due to the same IP.

In this first case/scenario, that's also probably why the first reported machine resolved on its own. DNS propagation finally completed.

Sometimes DNS server host record propagation can take 24-48 hours or maybe more depending on where the servers sit.

So the first thing is to contact the Network team and check to make sure the DNS records are corrected/propagated as expected.

In this particular application environment, most of the Unix servers have different hostnames configured for applications and different hostnames configured for UIM monitoring.

When we checked the DNS entry, it showed the current IP and host name which matches the robot.cfg but in the robot hosts file in /etc/hosts, the host name for the application is being used.

UIM reads the host file and sends data to the application hostname. That information was not seen on the console (OC Inventory), but we saw this info in the database. There were more than 100 servers having this issue and which was causing problems with CDM QOS metric collection.

Result: Robot and Source was stored differently in the database

Out of the box, we do not prevent/ignore UIM reading the host file, hence the changes listed below must be implemented.

So most importantly, in those robots where the QOS metric collection was adversely affected, please set:

"Set QoS to robot name instead of computer hostname"
In the robot.cfg the setting for "Set QoS to robot name instead of computer hostname" is: set_qos_source = yes

Then it should just ignore the hosts/DNS and use the defined robot name in the robot.cfg.
Make sure robotname = is NOT set blank. It ust not be empty.
In the MCS UI, adjust the Setup_cdm profile, by enabling "Allow QoS Source as Target."

You will have to change the setting in the MCS 'Setup_cdm" profile for the related group/groups because if you use raw configure to change it in the cdm.cfx package and distribute the change, it will most likely be overwritten by MCS within 4-5 minutes.
Lastly, clear the niscache and reset the robot dev id using the controller callbacks:

Select the controller probe and press Ctrl-P to open the probe utility.
Then click the cogwheel to select Expert Mode.
Clear the niscache and reset the robot dev id using the controller callbacks.
Try this on a few machines first to see if it works after the robot restarts and the monitoring intervals for CPU and Memory have been passed.

Additional Information

For MCS

Note that to do a bulk change for cdm to enable "Allow QoS Source as Target" on all of the application servers, you will have to change the setting in the MCS 'Setup_cdm" profile for the related group/groups

If you only use raw configure to change it in the cdm.cfx package and distribute it, it will most likely be overwritten by MCS in 4-5 minutes.

This was an environment-specific issue but this can apply to any environment whether using MCS or Non-MCS for configuration.

Potential Symptoms:

a) robot and source is stored as different values in the database

b) CPU and Memory QOS are not being collected

In this one customer's UNIX team’s environment, instead of using the source host name for monitoring, their /etc/hosts file references the target application, which is where the monitored application is running, so it represents the hostname where the application sits. (Hence robot and source values were stored as different values in the S_QOS_DATA table.)

To workaround this issue:

Set robotname in robot.cfg (if not already configured). robotname cannot be empty.
In the controller probe, enable the option: "Set QoS source to robot name instead of computer hostname."
In the cdm probe, enable the option: "Allow QoS source as target."

If any of the settings listed above are not enabled, cdm QOS collection may not work as expected.

Essentially, these settings can be used to force the robot to be the same as the source and vice versa if they end up being different in the database table due to how the robot hostname is resolving, versus how the target application location is being referenced.

This may happen to other customers' internal teams/depts, in general if the source and robot ends up being stored differently in the DB. If that happens, you may find that the QOS metrics are not being collected.

These changes will ensure that all CPU and Memory metrics—will be collected and visible in the OC Metric View.

These settings and results were confirmed in a large customer environment.