Slower than expected SharePoint Discover Scan performance
search cancel

Slower than expected SharePoint Discover Scan performance


Article ID: 204421


Updated On:


Data Loss Prevention Endpoint Discover


SharePoint Discover Scan performance may be lower than expected and you observe large delays between each site that is crawled and scanned.


The Discover Crawler makes a call to getUserCollectionFromSite for each site.

When the list of users to be returned is very large, it can introduce large delays crawling and scanning these sites. If the time taken to retrieve the list of users exceeds the BoxMonitor.HeartbeatGapBeforeRestart 
setting you may also experience FileReader restarts which can cause continual looping on the same site because the checkpoint will start the scan again at the same point the next time the scan starts.

To confirm this, you can enable the following logging on the Discover server in = FINE = FINEST = FINE
java.util.logging.FileHandler.level = FINEST

This will give you log entries similar to the following for each site: getUserCollectionFromSite
FINE: getUserCollectionFromSite() for http://fqdn/sites/site_name logSharePointCallTimeInformation
FINEST: Profile - WebServiceCall|getUserCollectionFromSiteWS|http://fqdn/sites/site_name| Elapsed Time = 1412777 getUserCollectionFromSite
FINE: SharePointUserOrGroupInfo.getUserCollectionFromSite: 52896 users retrieved


On each affected Discover server:

  1. Navigate to the file
  2. Append (or update) the following setting
    sharepointcrawler.use.usergroupinfocache = true

Additional Information


This setting is found under the detection server Advanced Settings. By default, this setting is enabled, and fetches file ACLs to present in incident snapshots. If this information is not needed in incidents, disabling Discover.Sharepoint.FetchACL on the detection servers that are performing the discover scans can reduce total scan time by eliminating the ACL fetches. When the average scanned file size is on the small side, and the file count is high, this can provide a significant increase in scan performance.


See also: Discover - SharePoint Scan Improvements