Correcting issues with samplemax errors
search cancel

Correcting issues with samplemax errors

book

Article ID: 91804

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

data_engine may fail to insert data when there is a conflict in the QOS_DEFINITION for a metric, and the data being inserted for the same metric from a probe.

This happens most often when the cdm and vmware probes are in use in the environment and monitoring the same systems.  These probes, in some cases, will share definitions for metrics like QOS_CPU_USAGE and QOS_MEMORY_USAGE.

Additional background on this is available here: https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=34371


This article describes a process which can be used to recover from this situation but should be used with extreme caution; it is therefore "internal only" and should be leveraged by support engineers who fully understand the issue and its implications.

Environment

Release:
Component: UIMCDM

Resolution

The following resolution makes the following assumptions:

- the CDM probe is submitting QoS with the 'samplemax' defined (verified with DrNimbus)
- the vmware probe is submitting the same QoS without the 'samplemax' included (again, DrNimbus can verify this)
- the data_engine is throwing errors that specifically use the wording "samplemax included in data but not definition."  If it says "samplemax is missing" then you should refer to the article linked at the beginning of this article.

--NOTE: ENSURE A BACKUP OF THE DATABASE EXISTS SO YOU CAN RESTORE IF ANYTHING GOES WRONG-- 

1. obtain the qos_def_id from either s_qos_data or s_qos_definition for the offending metric - in this example we'll use QOS_MEMORY_USAGE. 
2. Verify in S_QOS_DEFINITION that the 'hasmax' column for this QoS definition is '0' indicating that the definition does not expect a samplemax. 
3. STOP DATA_ENGINE PROBE. (very important)
4. plug the given qos_def_id into this query: 
UPDATE S_QOS_DEFINITION SET HASMAX=1 WHERE QOS_DEF_ID = ???; 
COMMIT;   --this is necessary only on Oracle

5. the qos_def_id will also match the RN table; e.g. qos_def_id 33 will be stored in RN_QOS_DATA_0033 and the historic data will be in HN_QOS_DATA_0033. 

6. Using DrNimbus, determine what value the CDM probe is sending as the 'samplemax' value - for example "648027".

Use that knowledge to complete the below query on the RN and HN tables which will add the samplemax column, and set it to '648027' by default for the existing rows. 
---NOTE: this assumes that the 'samplemax' value being transmitted by the CDM probe is 648027. alter that to something else if this is not the case. 

MSSQL/MySQL:
ALTER TABLE RN_QOS_DATA_???? ADD SAMPLEMAX NUMBER(28,2) NOT NULL DEFAULT 648027 WITH VALUES; 
ALTER TABLE HN_QOS_DATA_???? ADD SAMPLEMAX NUMBER(28,2) NOT NULL DEFAULT 648027 WITH VALUES; 
ALTER TABLE DN_QOS_DATA_???? ADD SAMPLEMAX NUMBER(28,2) NOT NULL DEFAULT 648027 WITH VALUES; 

Oracle:
ALTER TABLE RN_QOS_DATA_???? ADD SAMPLEMAX NUMBER(28,2) DEFAULT 648027 NOT NULL; 
ALTER TABLE HN_QOS_DATA_???? ADD SAMPLEMAX NUMBER(28,2) DEFAULT 648027 NOT NULL; 
ALTER TABLE DN_QOS_DATA_???? ADD SAMPLEMAX NUMBER(28,2) DEFAULT 648027 NOT NULL; 
COMMIT; 
--note: these queries may take quite some time if there is a large amount of data. If so, it might be safer to issue the COMMIT; after each query rather than at the end. 


7. change the configuration in the vmware probe for QOS_MEMORY_USAGE to hasmax=648027 and restart it. (again, this assumes that '648027' is the value from CDM.) 
8. restart the vmware probe 
9. activate data_engine. 

End result: vmware probe should contain 'samplemax 648027' in the QOS_MEMORY_USAGE data, to match CDM probe, and data_engine should accept both, because the definition now has 'hasmax=1' set and the appropriate columns exist in the RN/HN tables.