Differential Analysis explanation

book

Article ID: 196170

calendar_today

Updated On:

Products

CA Application Performance Management Agent (APM / Wily / Introscope) CA Application Performance Management (APM / Wily / Introscope) INTROSCOPE DX Application Performance Management

Issue/Introduction

This knowledge document  attempts to expand on the information contained in the documentation 

Environment

Release : 10.7.0

Component : APM EM

Resolution

Differential Analysis (DA) works on top of standard deviation, (SD), and is calculated for Average Response Times as standard. SD is a standardized statistical calculation and is a probability measure for how a metrics value, ART, distributes around its mean value.

For DA we calculate SD, 2*SD, and 3*SD, which is what gives us the DA bands. Each band has a probability attached to it - i.e. a percentage that the value will fall within the band: 68%, 95,5%, 99,5%. Given ART follows normal distribution (and ARTs do that when things are normal (only random variation), and do not when things are not normal: influenced consistently by something skewing it).

On top of DA we have the Western electric rules:
1: 1 value outside 3rd band (i.e. value > 3*SD => less that 100-99,7% ~ 0.3% probability)
2: 2 of 3 outside 2nd band (i.e. value > 2*SD => less then (100-95.5%)^2*3 ~ 0.6% probability)
3: 4 of 5 outside 1st band (i.e. value > SD => less than (100-68%)^4*5 ~ 5% probability)
4: 10 consecutive rising values

Every metric value that violates a rule adds to the instability count for its period. 

Instability counts are kept for a window, by default 20 15s intervals = 300secs = 5 minutes. So by default there are 20 counts covering the 5 minutes.

This gives a set of instability counts for calculating variance and an options on how to sum up the values: 1: do you want all values to weigh in equally; 2: do you want recent values to weigh in more and older values to weigh in less; 3: do you want only the most recent values to weigh in.
You can think of this as aging out of older values. How much the oldest value is reduced from 0% to 100%. This is decay.

The higher the decay the faster the ageing out of old values. 100% being fastest where the oldest count is reduced 100%. The newest value is always reduced 0%. 
Progressively older values are reduced linearly with their proportion. If the decay is 20 (i.e. the oldest value is reduced by 20%) and we have 20 counts each count is reduced by 1% giving reduction of 0,1, 2, 3, 4... 20%.
If decay is 40 still with 20 counts reductions would be 2%: 0, 2, 4, ... 40%
If decay is 100 still with 20 counts reductions would be 0, 5, 10, 15, ..., 100%.

Reductions are then applied to counts and counts are summed to give the variance metric value.

From our research experience we observed that 10-20 is stable, 20-30 is slightly unstable, and 30-40 is unstable. That's why the default settings are 20 and 30 respectively.

Statistically SD "drowns" outliers - rogue transactions.
Which is why we explain that DA is for determining instability in your system to attract attention to anomalies (brewing issues) and problems (user affecting issues) - not to determine user experience. Use SLAs for that and set absolute thresholds.