The first thing to do is to query the database directly to see if the data exists. This can be accomplished via the SLM portlet or the SQL interface on your database server.
There are two tables that should be checked; S_QOS_DATA and the corresponding RN table(s). The first query checks the S_QOS_DATA table to ensure that we are receiving and processing QOS_DEFINITION messages.
SELECT * FROM S_QOS_DATA WHERE probe = '<probe name>';
If no data was returned, jump to TOPIC 2 and 3 below for troubleshooting hub queues and data_engine.
NOTE : probe = 'pollagent' should be used if you check QOS data produced by snmpcollector probe.
You will want to note the table_id and r_table fields from the above query for the second query:
SELECT * FROM '<r_table>' WHERE table_id = <table_id> ORDER BY sampletime DESC;
If there is no last 24 hours data, jump to TOPIC 2 and 3 below.
If there is at least data in the last 24 hours, go to TOPIC 4.NOTE : If there is no data for the last 24 hours, USM won't show you metric trees (in Metrics tab) for the metric.
2) Hub queues
Provided that the queues are set up correctly, the next place to inspect is data_engine. data_engine has two jobs related to this topic. One job is to prepare the schema by setting up the entries in S_QOS_DEFINITION and S_QOS_DATA. These are created by QOS_DEFINITION messages generated by the probe on startup.
If you didn't see the entries in the S_QOS_DATA table in the query above then you'll see errors in the data_engine log when you restart the problem probe. You'll want the data_engine log at level 3 to catch these. However if no errors are found, sometimes, deactivating the data_engine probe and activating it again could make a difference.
If you have already seen data in the S_QOS_DATA table then QOS_DEFINITION messages are being processed and set up correctly. In that case, there may be a problem with the QOS definition that won’t allow it to save the monitored data. We sometimes see issues when the definition has been set up with a hasmax value but the probe isn’t sending data with a max value. Again, this will be logged in the data_engine log. The steps to fix depend on the situation and a support ticket is probably the best way to approach this.
Usually if it is showing up in the database, then it should be showing up in Performance Reports Designer (PRD) as well as PRDs are very aligned with the S_QOS_DATA table. It’s a good idea to double check a PRD to make sure it will graph your data, however we most often see problems in USM.
If the issue is that it's not seen in USM, then there could be 3 problems:
a) The device doesn't exist in inventory
If the device doesn't exist in inventory, then it could be a failure on discovery_server's part. There are a few reasons why this might happen. Depending on the probe architecture, it could be a queue issue or it could be an inability of the discovery_server to contact the robot that the probe is installed on.
Probes that rely on discovery queues to publish inventory data are:
and other storage probes in general.
In this case, it is necessary to ensure that there are discovery queues in place to pass the discovery messages up to your Primary hub. The parent hub of the robot needs to have an “ATTACH” queue that listens for probe_discovery messages.
The hub that retrieves data from that hub also needs a queue as stated above unless that hub is the primary hub, at which point the discovery_server creates its own listening queue.
If the hubs are not configured with the queues, then those queues need to be created along with the corresponding “GET” queues and the probe needs to be restarted.
b) There are correlation problems with devices in inventory and the data is matched to an unexpected entry or the data is attached to an unexpected device.
There are many tables that rely on JOIN statements to form a complete chain from CM_COMPUTER_SYSTEM to S_QOS_DATA. This will verify that this chain is complete.
Log back into the database to run some queries
SELECT * FROM S_QOS_DATA WHERE probe = '<probe name>';
Choose one of those results and copy the ci_metric_id value. Then run the following query.
SELECT * FROM CM_CONFIGURATION_ITEM_METRIC WHERE ci_metric_id = '<ci_metric_id>';
If data is not returned, jump down to TOPIC C
If data is returned, take the ci_id value from the returning record and run
SELECT * FROM CM_CONFIGURATION_ITEM WHERE ci_id = '<ci_id>';
Then take the dev_id from the returning record and run
SELECT * FROM CM_DEVICE WHERE dev_id = '<dev_id>';
Then take the cs_id from the returned record and run
SELECT * FROM CM_COMPUTER_SYSTEM WHERE cs_id = '<cs_id>';
This will return the entry in USM that you will find the QOS data listed under. Sometimes, it is not the device you are expecting.
c) There is a ci_metric_id mismatch
ci_metric_id mismatches can be figured out fairly quickly. The first step is to go to the robot, clear out the niscache folder and restart the robot. This ensures that we don't have an old robot device ID, which all metric IDs are ultimately based on. This commonly happens on cloned VMs that already have a robot installed on them.
Then pull up DrNimbus and watch for any QOS_MESSAGE from the target probe. When you see a message from that probe, click on it. Look for a field called met_id. You’ll need to manually type the met_id into your query below as DrNimbus does not allow copy/paste.
SELECT * FROM S_QOS_DATA WHERE ci_metric_id = '<met_id>';
If this query doesn't return data, then you need to
UPDATE S_QOS_DATA SET ci_metric_id = NULL WHERE probe = '<probe name>';
Then restart data_engine and wait for the probe to send metrics again. Check USM and see if your data shows up.
SELECT * FROM CM_CONFIGURATION_ITEM_METRIC WHERE ci_metric_id = '<met_id>';
If this query doesn't return data, then it's time to start checking the discovery_server for errors in the logs related to the robot that homes that probe and could be due to issues discussed in TOPIC a.