This knowledge document attempts to expand on the information contained in the documentation
Release : 10.7.0
Component : APM EM
Differential Analysis (DA) works on top of standard deviation, (SD), and is calculated for Average Response Times as standard. SD is a standardized statistical calculation and is a probability measure for how a metrics value, ART, distributes around its mean value.
For DA we calculate SD, 2*SD, and 3*SD, which is what gives us the DA bands. Each band has a probability attached to it - i.e. a percentage that the value will fall within the band: 68%, 95,5%, 99,5%. Given ART follows normal distribution (and ARTs do that when things are normal (only random variation), and do not when things are not normal: influenced consistently by something skewing it).
On top of DA we have the Western electric rules: 1: 1 value outside 3rd band (i.e. value > 3*SD => less that 100-99,7% ~ 0.3% probability) 2: 2 of 3 outside 2nd band (i.e. value > 2*SD => less then (100-95.5%)^2*3 ~ 0.6% probability) 3: 4 of 5 outside 1st band (i.e. value > SD => less than (100-68%)^4*5 ~ 5% probability) 4: 10 consecutive rising values
Every metric value that violates a rule adds to the instability count for its period.
Instability counts are kept for a window, by default 20 15s intervals = 300secs = 5 minutes. So by default there are 20 counts covering the 5 minutes.
This gives a set of instability counts for calculating variance and an options on how to sum up the values: 1: do you want all values to weigh in equally; 2: do you want recent values to weigh in more and older values to weigh in less; 3: do you want only the most recent values to weigh in. You can think of this as aging out of older values. How much the oldest value is reduced from 0% to 100%. This is decay.
The higher the decay the faster the ageing out of old values. 100% being fastest where the oldest count is reduced 100%. The newest value is always reduced 0%. Progressively older values are reduced linearly with their proportion. If the decay is 20 (i.e. the oldest value is reduced by 20%) and we have 20 counts each count is reduced by 1% giving reduction of 0,1, 2, 3, 4... 20%. If decay is 40 still with 20 counts reductions would be 2%: 0, 2, 4, ... 40% If decay is 100 still with 20 counts reductions would be 0, 5, 10, 15, ..., 100%.
Reductions are then applied to counts and counts are summed to give the variance metric value.
From our research experience we observed that 10-20 is stable, 20-30 is slightly unstable, and 30-40 is unstable. That's why the default settings are 20 and 30 respectively.
Statistically SD "drowns" outliers - rogue transactions. Which is why we explain that DA is for determining instability in your system to attract attention to anomalies (brewing issues) and problems (user affecting issues) - not to determine user experience. Use SLAs for that and set absolute thresholds.