You want to understand the following warning and errors in the ContentExtractionHost_FileReader logs:
05/21/21 11:53:08 | WARN | cehost | ImageExtractorPlugin [6132] | [11932] | Failed to initialize extraction instance. | PdfExtractor.cpp (82)
05/21/21 11:53:08 | ERROR | cehost | ImageExtractorPlugin [6132] | [11932] | File type can not be identified | ImageExtractorPluginLib.cpp (198)
05/21/21 11:53:08 | INFO | cehost | FileTypeIdentifierRequestExecutor [6132] | [11932] | Plugin File Type Identification from stream failed in plugin ImageExtractorPlugin with retval = 1, Exception thrown from : SPIExecutor.cpp(143) | FileTypeIdentifierRequestExecutor.cpp (149)
DLP 15.x
One cause for this is when encrypted PDF files are sent to CEH (content extraction host) but there is no plugin configured that can decrypt them.
To determine if the cause is that the files are encrypted, enable trace logging in the log4cxx_config_filereader.xml file.
Windows: C:\Program Files\Symantec\DataLossPrevention\DetectionServer\<ver>\Protect\config\
Linux: /opt/Symantec/DataLossPrevention/DetectionServer/<ver>/Protect/config
Update the following nodes within the XML, then restart the Symantec DLP Detection Server service:
<appender name="cehostAppender" class="org.apache.log4j.RollingFileAppender">
<param name="file" value="C:/ProgramData/Symantec/DataLossPrevention/DetectionServer/15.8.00000/logs/debug/ContentExtractionHost_FileReader.log" />
<param name="append" value="true" />
<param name="MaxFileSize" value="10240KB" />
<param name="MaxBackupIndex" value="100" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{%m/%d/%y %H:%M:%S} | %-5p | %c{2} | %m%n" />
</layout>
</appender>
<category name="cehost" >
<priority value ="trace" />
<appender-ref ref="cehostAppender"/>
</category>
Sample log output showing that a particular PDF file is encrypted:
05/21/21 11:53:08 | TRACE | cehost | Service [6132] | [11932] | Start of Request Id #50701 CEService Id #34 CE Request Type: FILETYPE_IDENTIFICATION | CEService.cpp (985)
05/21/21 11:53:08 | WARN | cehost | ImageExtractorPlugin [6132] | [11932] | Failed to initialize extraction instance. | PdfExtractor.cpp (82)
05/21/21 11:53:08 | ERROR | cehost | ImageExtractorPlugin [6132] | [11932] | File type can not be identified | ImageExtractorPluginLib.cpp (198)
05/21/21 11:53:08 | TRACE | cehost | SPIExecutor [6132] | [11932] | Finished call to DetectTypeFromStream for plugin ImageExtractorPlugin in 42 milliseconds. | SPIExecutorScopedOperationLogger.cpp (45)
05/21/21 11:53:08 | INFO | cehost | FileTypeIdentifierRequestExecutor [6132] | [11932] | Plugin File Type Identification from stream failed in plugin ImageExtractorPlugin with retval = 1, Exception thrown from : SPIExecutor.cpp(143) | FileTypeIdentifierRequestExecutor.cpp (149)
05/21/21 11:53:08 | TRACE | cehost | SPIExecutor [6132] | [11932] | Calling DetectTypeFromStream for plugin Verity. | SPIExecutorScopedOperationLogger.cpp (36)
05/21/21 11:53:08 | TRACE | cehost | Verity [6132] | [11932] | Document is encrypted | ..\DocumentIdentificationHelper.c (201)
05/21/21 11:53:08 | TRACE | cehost | Verity [6132] | [11932] | Failed to open stream | src\VerityImplInternal.c (452)
05/21/21 11:53:08 | TRACE | cehost | Service [6132] | [11932] | End of Request Id #50701 CEService Id #34 CE Request Type: FILETYPE_IDENTIFICATION Execution Time=124 MilliSeconds | CEService.cpp (995)