DLP Detection and OCR server tuning
search cancel

DLP Detection and OCR server tuning

book

Article ID: 215995

calendar_today

Updated On:

Products

Data Loss Prevention Data Loss Prevention Discover Suite

Issue/Introduction

You want to tune your Detection and OCR servers to optimize for the hardware and avoid missed detections.

Environment

DLP 15.8 and later.

Resolution

  1. Start by reading the OCR sizing guide and implementing sufficient server counts and hardware resources accordingly.
     
  2. Ensure that standard tuning of the following has been done on the OCR Server itself, at <install_dir>/Protect/config/OCR.properties:
    1. Set the value of setting num.ocr.workers to equal the number of logical cores.
    2. Set the value of server.tomcat.max-threads to equal the value of setting num.ocr.workers + 1.

  3. Verify that the following detection server Advanced Server setting is not too high:

    1. ImageRecognition.NUM_WORKER_THREADS (default 2)
    2. Consider the impact of the current ContentExtraction.MaxNumImagesToExtract setting to be sure that the setting meets scanning requirements. This setting affects both OCR and Forms detection.
    3. If you are seeing intermittent bursts of "OCRServiceBusyException", you can tune the following settings in OCRDetection.properties on the detection server:

      1. ocr.client.retry.number (default 2)
      2. ocr.client.retry.delay (default 50 milliseconds)

  4. In the Enforce OCR Configuration:

    1. Consider the effect of Accuracy vs Speed. Lower accuracy when appropriate can improve throughput.
    2. Evaluate whether the selected Languages and/or Dictionaries are all required. Reducing these when appropriate can improve throughput.

  5. Lastly, you can continue to scale out your OCR service layer by adding additional OCR Servers behind a Load Balancer.

Additional Information