IDM indexing of a document fails to complete successfully
book
Article ID: 405282
calendar_today
Updated On:
Products
Data Loss Prevention Core PackageData Loss Prevention Enforce
Issue/Introduction
After attempting to create an IDM index of a document (i.e. a PDF), even if the Enforce UI shows that the index was successfully created and replicated to DLP detection servers, there are problems seen with the index such as:
It's not possible to enable Endpoint Partial Matching for the index
Detection does not work on the index
Cause
First thing to check is the localhost log on Enforce, look for the below line:
[com.vontu.profiles.manager.document.DocumentSourceIndexCreator] IDM Indexing took X.X seconds. Estimated memory usage: X bytes.
If the content extraction of text from the indexed file or files was completed successfully, the memory usage of an IDM index will likely be of a couple hundreds/thousands of bytes. That amount depends on the size of the indexed files - the larger the source file, the more content to extract from it. If however the memory usage estimate is lower than expected, then that would suggest the content extraction from the document did not complete and the created index does not include the correct content of the document. This could impact detection accuracy or the possibility to enable Endpoint Partial Matching.
NOTE: For partial file contents matching, there must be at least 300 normalized characters in the indexed file. However, the exact length is variable depending on the file contents and encoding. You can find out more here: Using IDM to Detect Exact and Partial File Contents
Second log to check is the ContentExtractionHost_Manager.log around the same timestamp as when the indexing was performed. You may see entries similar to the below in the log:
DEBUG | cehost | CEPluginManager [9556] | [14268] | config file not found: E:\ProgramData\Symantec\DataLossPrevention\EnforceServer\<version>\ContentExtractionData\PluginData\Verity\plugin_settings, Exception thrown from : HostPluginConfigReader.cpp(106) | C:\VontuDev\workDir\ContentExtractionAPI\CEHost\CEPluginManager.cpp (295) ERROR | cehost | Verity [9556] | [14268] | Could not initialize the Verity library: GetKvFilterLibHandle("C:\Program Files\Symantec\DataLossPrevention\KeyView\<version>\Protect\plugins\contentextraction\Verity\x64") failed | C:\VontuDev\workDir\ContentExtractionAPI\Plugins\Verity\VerityLib\src\VerityImplInternal.c (213) WARN | cehost | CEPluginManager [9556] | [14268] | Failed to load Verity. Error: Plugin Startup - Initialization of plugin Verity failed. retVal = 1, context = 0000000000000000. Skipping this plugin | C:\VontuDev\workDir\ContentExtractionAPI\CEHost\CEPluginManager.cpp (258)
As per these entries, Enforce failed to initiate the Verity plugin, responsible for performing the content extraction from files during indexing. This is what then leads to incorrect content extraction and other problems with the index further down the line.
Resolution
Look for the file plugin_settings.txt which will be located under the following path: <drive>:\Program Files\Symantec\DataLossPrevention\ContentExtractionService\<version>\Plugins\Protect\plugins\contentextraction\Verity
Inside the file, check the value of the following parameter: keyViewDirectory=C:\Program Files\Symantec\DataLossPrevention\KeyView\<version>\Protect\plugins\contentextraction\Verity\x64
It is possible that the path is set incorrectly, for example if DLP has been installed to a different disk than C:, or to a custom location.
After updating the keyViewDirectory parameter to a valid path to KeyView, and restarting the DLP services on Enforce, a subsequent creation of the IDM index from the same document should now produce a correct index, which should work in detection and, if the file contains enough content, should allow to have Endpoint Partial Matching enabled.