Slower than expected SharePoint Discover Scan performance
search cancel

Slower than expected SharePoint Discover Scan performance

book

Article ID: 204421

calendar_today

Updated On:

Products

Data Loss Prevention Endpoint Discover

Issue/Introduction

SharePoint Discover Scan performance may be lower than expected and you observe large delays between each site that is crawled and scanned.

Cause

The Discover Crawler makes a call to com.symantec.dlp.sharepoint.connector.soap.SharePointUserGroupDirectory getUserCollectionFromSite for each site.

When the list of users to be returned is very large, it can introduce large delays crawling and scanning these sites. If the time taken to retrieve the list of users exceeds the BoxMonitor.HeartbeatGapBeforeRestart 
setting you may also experience FileReader restarts which can cause continual looping on the same site because the checkpoint will start the scan again at the same point the next time the scan starts.

To confirm this, you can enable the following logging on the Discover server in FileReaderLogging.properties:

com.symantec.dlp.sharepoint.connector.provider.impl.SharePointSOAPWSProvider.level = FINE
com.symantec.dlp.sharepoint.connector.soap.SharePointSOAPWSInvoker.level = FINEST
com.symantec.dlp.sharepoint.connector.soap.SharePointUserGroupDirectory = FINE
java.util.logging.FileHandler.level = FINEST

This will give you log entries similar to the following for each site:

com.symantec.dlp.sharepoint.connector.provider.impl.SharePointSOAPWSProvider getUserCollectionFromSite
FINE: getUserCollectionFromSite() for http://fqdn/sites/site_name

com.symantec.dlp.sharepoint.connector.soap.SharePointSOAPWSInvoker logSharePointCallTimeInformation
FINEST: Profile - WebServiceCall|getUserCollectionFromSiteWS|http://fqdn/sites/site_name| Elapsed Time = 1412777

com.symantec.dlp.sharepoint.connector.soap.SharePointUserGroupDirectory getUserCollectionFromSite
FINE: SharePointUserOrGroupInfo.getUserCollectionFromSite: 52896 users retrieved

Resolution

On each affected Discover server:

  1. Navigate to the Crawler.properties file
  2. Append (or update) the following setting
    sharepointcrawler.use.usergroupinfocache = true

Additional Information

Discover.Sharepoint.FetchACL

This setting is found under the detection server Advanced Settings. By default, this setting is enabled, and fetches file ACLs to present in incident snapshots. If this information is not needed in incidents, disabling Discover.Sharepoint.FetchACL on the detection servers that are performing the discover scans can reduce total scan time by eliminating the ACL fetches. When the average scanned file size is on the small side, and the file count is high, this can provide a significant increase in scan performance.

 

See also: Discover - SharePoint Scan Improvements