Data Loss PreventionData Loss Prevention Discover Suite
Issue/Introduction
You want to tune your Detection and OCR servers to optimize for the hardware and avoid missed detections.
Environment
DLP 15.8 and later.
Resolution
Start by reading the OCR sizing guide and implementing sufficient server counts and hardware resources accordingly.
Ensure that standard tuning of the following has been done on the OCR Server itself, at <install_dir>/Protect/config/OCR.properties:
Set the value of setting num.ocr.workers to equal the number of logical cores.
Set the value of server.tomcat.max-threads to equal the value of setting num.ocr.workers + 1.
Verify that the following detection server Advanced Server setting is not too high:
ImageRecognition.NUM_WORKER_THREADS (default 2)
Consider the impact of the current ContentExtraction.MaxNumImagesToExtract setting to be sure that the setting meets scanning requirements. This setting affects both OCR and Forms detection.
If you are seeing intermittent bursts of "OCRServiceBusyException", you can tune the following settings in OCRDetection.properties on the detection server:
ocr.client.retry.number (default 2)
ocr.client.retry.delay (default 50 milliseconds)
In the Enforce OCR Configuration:
Consider the effect of Accuracy vs Speed. Lower accuracy when appropriate can improve throughput.
Evaluate whether the selected Languages and/or Dictionaries are all required. Reducing these when appropriate can improve throughput.
Lastly, you can continue to scale out your OCR service layer by adding additional OCR Servers behind a Load Balancer.