When testing a VML policy some documents are not triggering a response / incident as expected. These documents may be documents included in the positive training set.
Note other training documents do trigger a VML policy rule response as expected.
Release : 15.8, 16.0
VML data profiles need to be tuned. Unlike other DLP rules, VML does not have specific data to match but rather a trained algorithm. When training data is insufficient or the similarity / confidence level is not set appropriately then VML policy rules may not respond as expected. Proper training and tuning is part of machine learning optimization.
If there is a document that is not getting the desired response (positive or negative) then try following:
First check the FINEST level logs during the detection of the files in question. Here are some examples:
Example 1 (very negative):
Message: Distance: [-0.888694] Confidence : [0.0711924] ConfidenceThreshold : 
Message: Text/Document classified as negative with Confidence [0.0711924] for Condition 
Example 2 (very positive):
Message: Distance: [0.999837] Confidence : [0.993636] ConfidenceThreshold : 
Message: Text/Document classified as positive with Confidence [0.993636] for Condition 
Example 3 (barely positive):
Message: Distance: [0.113251] Confidence : [0.813633] ConfidenceThreshold : [0.8099]
Message: Text/Document classified as positive with Confidence [0.813633] for Condition 
In the first example no incident is created. In the last two examples an incident was created. In general. If the confidence is higher than the threshold an incident will be created. Note: If the distance value is negative then a lower threshold would be needed to have a positive confidence / incident. Negatively trained documents and documents very different in content than the trained positive data will generally have a negative distance value.
To resolve the issue go in to the console and do one of the following
1. Add more training data similar to the document that you are testing in favor of the negative or positive response you are expecting
2. Tune the VML data profile by adjusting the Similarity value (aka confidence threshold). The lower similarity threshold the more incidents a possible false positives you will see. The higher the Similarity Threshold the fewer incidents and false negatives will be.
Note that Similarity threshold of 10.0 = Confidence Threshold of 1 in the logs. a Similarity threshold of 5.0 = Confidence Threshold of .5.