search cancel

Guidelines for Provisioning Symantec Data Loss Prevention scans for Microsoft SharePoint Targets

book

Article ID: 170149

calendar_today

Updated On:

Products

Data Loss Prevention Network Discover Data Loss Prevention

Issue/Introduction

This article contains recommendations and guidelines for configuring Discover Servers to scan Microsoft SharePoint repositories efficiently.

Resolution

Tunable parameters

Symantec recommends the following settings for each Discover server.

  1. crawler.threadpoolsize = 30 (default value - found in crawler.properties file)
    where crawler.threadpoolsize represents the number maximum number of crawler threads.
    Note: Use the recommended value only if your setup conforms to the recommended hardware configuration in the table below.

  2. MessageChain.NumChains =
    Virtual Systems: 0.8 * No. of CPU cores
    Hardware Systems: 1.0 * No. of CPU cores
    *note, on hyperthread enabled CPUs you may double these values*

  3. MessageChain.CacheSize = MessageChain.NumChains
    where MessageChain.CacheSize represents the size of the Detection (MessageChain) queue.

  4. FileReader.MaxFileSystemCrawlerMemory = (crawler.threadpoolsize + MessageChain.NumChains + MessageChain.CacheSize) * FileReader.MaxFileSize
    where FileReader.MaxFileSystemCrawlerMemory represents the total run-time memory for all running threads.

  5. BoxMonitor.FileReaderMemory = 4 * FileReader.MaxFileSystemCrawlerMemory
    where BoxMonitor.FileReaderMemory represents a dynamic memory pool holding all run-time data about the FileReader. This value should be less than the assigned system memory.

  6. crawler.grid.follower.queuesize = 2 * crawler.threadpoolsize
    where crawler.grid.follower.queuesize represents the maximum number of files for detection that can be added to the grid queue. This setting is applicable to grid scans only.
  7. crawler.grid.queuesize.multiplier = 4 * crawler.threadpoolsize
    where crawler.grid.queuesize.multiplier represents the grid scan request queue size per detection server. This setting is applicable to grid scans only.

You can use the attached spreadsheet to calculate the recommended values for these parameters.

Scan target configuration guidelines

Symantec recommends the following guidelines for configuring  SharePoint scan targets:

  • As much as possible, divide the Microsoft SharePoint Site Collections/WebApps uniquely amongst the deployed Discover servers.
  • To avoid scanning unnecessary files, configure filters based on the expected items to be scanned on the basis of the File Type, Date Modified, and file size attributes.

Scan mode guidelines:

  • When you select Grid as the scan mode, ensure that the grid scanning-specific tuning parameters are configured on all of the Discover servers in the grid.
  • To configure a grid scan, you must select at least 2 Discover servers.
  • To initialize a grid scan, at least 2 of the selected Discover servers must be available.

Summary of configuration recommendations

Be aware that:

  • Scan throughput is affected by the available network bandwidth, number of CPU cores, and the total system memory of the participating Discover servers.
  • Scan throughput is affected by the complexity of the configured policies.
  • Scan throughput us affected by the caching of scanned content on SharePoint servers.
  • A higher active user count on a particular SharePoint server could reduce scan performance.
  • Scan performance is affected by the distances between the participating discover servers and the SharePoint server scanned.
  • In Grid scan mode, make sure Microsoft SharePoint Servers are configured to allow concurrent requests.
Parameters Recommended Configuration
(Single Server scan)
Recommended Configuration
(Grid scan mode)
Number of CPU cores  16 16
RAM (GB) 32 32
FileReaderMemory (GB) 16 16
FileReader.MaxFileSystemCrawlerMemory (MB) 2048 2048

CrawlerThread

30 30
MessageChain.NumChains 32 32
MessageChain.CacheSize 64 64
crawler.grid.follower.queuesize NA 60
crawler.grid.queuesize.multiplier NA 120

For more information, refer to the grid scanning performance guidelines in the Symantec Data Loss Prevention Administration Guide(15.7 and earlier), or the Symantec DLP Help Center(15.8 and later)