Guidelines for Provisioning Symantec Data Loss Prevention scans for Microsoft SharePoint Targets

book

Article ID: 170149

calendar_today

Updated On:

Products

Data Loss Prevention Network Discover Data Loss Prevention

Issue/Introduction

This article contains recommendations and guidelines for configuring Discover Servers to scan Microsoft SharePoint repositories efficiently.

Resolution

Tunable parameters

Symantec recommends the following settings for each Discover server.

  1. crawler.threadpoolsize = 30 (default value - found in crawler.properties file)
    where crawler.threadpoolsize represents the number maximum number of crawler threads.
    Note: Use the recommended value only if your setup conforms to the recommended hardware configuration in the table below.

  2. MessageChain.NumChains = 1 * No. of CPU cores (if the cores are hyper-threaded, then 2*no. of cores)
    where MessageChains.NumChain represents the number of messages in parallel that the FileReader will process.

  3. MessageChain.CacheSize = 2 * MessageChain.NumChains
    where MessageChain.CacheSize represents the size of the Detection (MessageChain) queue.

  4. FileReader.MaxFileSystemCrawlerMemory = (crawler.threadpoolsize + MessageChain.NumChains + MessageChain.CacheSize) * FileReader.MaxFileSize
    where FileReader.MaxFileSystemCrawlerMemory represents the total run-time memory for all running threads.

  5. BoxMonitor.FileReaderMemory = 4 * FileReader.MaxFileSystemCrawlerMemory
    where BoxMonitor.FileReaderMemory represents a dynamic memory pool holding all run-time data about the FileReader. This value should be less than the assigned system memory.

  6. crawler.grid.follower.queuesize = 2 * crawler.threadpoolsize
    where crawler.grid.follower.queuesize represents the maximum number of files for detection that can be added to the grid queue. This setting is applicable to grid scans only.
  7. crawler.grid.queuesize.multiplier = 4 * crawler.threadpoolsize
    where crawler.grid.queuesize.multiplier represents the grid scan request queue size per detection server. This setting is applicable to grid scans only.

You can use the attached spreadsheet to calculate the recommended values for these parameters.

Note: The grid scanning feature for Microsoft SharePoint Server target is available in Symantec Data Loss Prevention from version 15.1 onwards.

Scan target configuration guidelines

Symantec recommends the following guidelines for configuring  SharePoint scan targets:

  • As much as possible, divide the Microsoft SharePoint Site Collections/WebApps uniquely amongst the deployed Discover servers.
  • To avoid scanning unnecessary files, configure filters based on the expected items to be scanned on the basis of the File Type, Date Modified, and file size attributes.

Scan mode guidelines:

  • When you select Grid as the scan mode, ensure that the grid scanning-specific tuning parameters are configured on all of the Discover servers in the grid.
  • To configure a grid scan, you must select at least 2 Discover servers.
  • To initialize a grid scan, at least 2 of the selected Discover servers must be available.

Summary of configuration recommendations

Be aware that:

  • Scan throughput is affected by the available network bandwidth, number of CPU cores, and the total system memory of the participating Discover servers.
  • Scan throughput is affected by the complexity of the configured policies.
  • Scan throughput us affected by the caching of scanned content on SharePoint servers.
  • A higher active user count on a particular SharePoint server could reduce scan performance.
  • Scan performance is affected by the distances between the participating discover servers and the SharePoint server scanned.
  • In Grid scan mode, make sure Microsoft SharePoint Servers are configured to allow concurrent requests.
Parameters Recommended Configuration
(Single Server scan)
Recommended Configuration
(Grid scan mode)
Number of CP cores 16 16
RAM (GB) 32 32
FileReaderMemory (GB) 16 16
FileReader.MaxFileSystemCrawlerMemory (MB) 2048 2048

CrawlerThread

30 30
MessageChain.NumChains 32 32
MessageChain.CacheSize 64 64
crawler.grid.follower.queuesize NA 60
crawler.grid.queuesize.multiplier NA 120

For more information, refer to the grid scanning performance guidelines in the Symantec Data Loss Prevention 15.1 Administration Guide.