The qos_processor queue is backing up regularly and we see messages similar to the following in the qos_processor.log:
Jul 17 14:02:16:359 [Qos Monitor Database Updater #1, qos_processor] Did not update qos withupdate s_qos_data set origin = '<QoS origin>', host = '<IP address>', robot = '<robot name>', probe = '<probe name>', nim_origin = '<NIM origin>', modifier = 'nimsoft', samplerate = 300 where checksum = '40FA76A95474C94A5D9F53A194A5D9346517D2A7'
Note that this article was written for version 1.24 of the qos_processor probe, but should also be viable for earlier releases as well, if there is a problem with missing QoS data in the S_QOS_DATA table or mismatch checksums for the current entries in the S_QOS_DATA table.
In this scenario, the qos_processor_qos_message queue was backing up because the qos_processor was failing to locate an entry in the S_QOS_DATA table for the probe name listed in the error message which have a checksum value that matches the checksum recorded in the error message for the robot also listed in the error message.
The qos_processor is single threaded and so when it is hit with a bunch of updates that it cannot match in the S_QOS_DATA table it slows the processing down, to the point that the probe starts backing up.
There are 3 things you can do to try and eliminate this problem:
1. If you are not using the qos_processor to modify origins of your QoS data and you have no plans to ever use this, then the first workaround you can use is to change the setting of the origin-change-detection-enabled key in the <setup> section of the qos_processor probe from the default value of true to false. You would change this from the probe's Raw Configure GUI. Select the setup folder from the left-hand pane, then select the origin-change-detection-enabled key from the right-hand pane. Change the value for this key to false, save the change, apply it, and restart the probe.
2. The second workaround you can use is to deactivate and then activate the probe generating the error message in the qos_processor.log file on the robot where the probe is installed. It's possible that the QoS metrics from the probe may have been inadvertently deleted from the S_QOS_DATA table which would be causing this problem. This should force the missing entries to be added back to the S_QOS_DATA table and may resolve the problem.
3. The third alternative is to completely remove all of the QoS metrics collected by the probe on the robot reporting the error in the qos_processor.log file. The down side of this method is that you will lose all of the metrics collected from this probe and would be starting fresh. Here are the steps that you would need to follow to implement this workaround:
1. Deactivate the probe that is generating the error in the qos_processor.log file on the robot where it is installed.
2. Deactivate the data_engine probe.
3. It is highly recommended that you make a backup of your database first so that you can recover in case there are problems.
4. From a utility like the SQL Server Management studio, execute the following command:
delete from S_QOS_DATA where probe = '<probe name>' and robot = '<robot name>';
where <probe name> and <robot name> correspond to the ones that appear in the qos_processor.log error message.
5. Activate the data_engine probe.
6. Activate the probe