SharePoint Discover Scan performance may be lower than expected and you observe large delays between each site that is crawled and scanned.
The Discover Crawler makes a call to com.symantec.dlp.sharepoint.connector.soap.SharePointUserGroupDirectory getUserCollectionFromSite for each site.
When the list of users to be returned is very large, it can introduce large delays crawling and scanning these sites. If the time taken to retrieve the list of users exceeds the BoxMonitor.HeartbeatGapBeforeRestart
setting you may also experience FileReader restarts which can cause continual looping on the same site because the checkpoint will start the scan again at the same point the next time the scan starts.
To confirm this, you can enable the following logging on the Discover server in FileReaderLogging.properties:
com.symantec.dlp.sharepoint.connector.provider.impl.SharePointSOAPWSProvider.level = FINE
com.symantec.dlp.sharepoint.connector.soap.SharePointSOAPWSInvoker.level = FINEST
com.symantec.dlp.sharepoint.connector.soap.SharePointUserGroupDirectory = FINE
java.util.logging.FileHandler.level = FINEST
This will give you log entries similar to the following for each site:
com.symantec.dlp.sharepoint.connector.provider.impl.SharePointSOAPWSProvider getUserCollectionFromSite
FINE: getUserCollectionFromSite() for http://fqdn/sites/site_name
com.symantec.dlp.sharepoint.connector.soap.SharePointSOAPWSInvoker logSharePointCallTimeInformation
FINEST: Profile - WebServiceCall|getUserCollectionFromSiteWS|http://fqdn/sites/site_name| Elapsed Time = 1412777
com.symantec.dlp.sharepoint.connector.soap.SharePointUserGroupDirectory getUserCollectionFromSite
FINE: SharePointUserOrGroupInfo.getUserCollectionFromSite: 52896 users retrieved
On each affected Discover server:
Discover.Sharepoint.FetchACL
This setting is found under the detection server Advanced Settings. By default, this setting is enabled, and fetches file ACLs to present in incident snapshots. If this information is not needed in incidents, disabling Discover.Sharepoint.FetchACL on the detection servers that are performing the discover scans can reduce total scan time by eliminating the ACL fetches. When the average scanned file size is on the small side, and the file count is high, this can provide a significant increase in scan performance.
See also: Discover - SharePoint Scan Improvements