RemoteIDMIndexer filter does not work for Chinese characters
search cancel

RemoteIDMIndexer filter does not work for Chinese characters

book

Article ID: 233205

calendar_today

Updated On:

Products

Data Loss Prevention Discover Suite Data Loss Prevention

Issue/Introduction

The remote indexer does not filter files with Chinese characters. 

Environment

Release : 15.X and newer

Component : Remote IDM indexer

Cause

The system is unable to read Chinese Characters due to character encoding in the Windows console. It has to be switched to
UTF-8 (code page 65001)

Resolution

Use the "chcp 65001" command to switch to UTF-8 encoding, before running the remote indexer script.

 

Problem Example

Source files:

Script with exclude filter (does not work):

RemoteIDMIndexer.exe -uri="\\10.32.170.10\share\small" -out="C:\output\remoteidm2" -index_password_file="C:\Program Files\Symantec\DataLossPrevention\Indexers\15.7\Protect\bin\pass.txt" -exclude_filter="*改模联络*.xlsx"

Result: exclude is not applied due to unknown characters...

C:\Program Files\Symantec\DataLossPrevention\Indexers\(version)\Protect\bin>RemoteIDMIndexer.exe -uri="\\10.32.170.10\share\small" -out="C:\output\remoteidm2" -index_password_file="C:\Program Files\Symantec\DataLossPrevention\Indexers\(version)\Protect\bin\pass.txt" -exclude_filter="*µöµ¿íΦüöτ£*.xlsx"

Comparing to last indexing run, 2 new document(s) were added, 0 document(s) were updated, 8 document(s) were unchanged, and 0 document(s) were removed.

Index contains 10 files.

Approximate index size: 12,092 bytes.

Shutting down crawling service....Done.

Updating index to disk...Done.

Exiting...

Working Example

Script with additional entry: “chcp 65001”

chcp 65001
RemoteIDMIndexer.exe -uri="\\10.32.170.10\share\small" -out="C:\output\remoteidm2" -index_password_file="C:\Program Files\Symantec\DataLossPrevention\Indexers\(version)\Protect\bin\pass.txt" -exclude_filter="*改模联络*.xlsx"

Result: filter is applied correctly...

C:\Program Files\Symantec\DataLossPrevention\Indexers\(version)\Protect\bin>chcp 65001

Active code page: 65001
C:\Program Files\Symantec\DataLossPrevention\Indexers\(version)\Protect\bin>RemoteIDMIndexer.exe -uri="\\10.32.170.10\share\small" -out="C:\output\remoteidm2" -index_password_file="C:\Program Files\Symantec\DataLossPrevention\Indexers\(version)\Protect\bin\pass.txt" -exclude_filter="*改模联络*.xlsx"
Comparing to last indexing run, 0 new document(s) were added, 0 document(s) were updated, 8 document(s) were unchanged, and 2 document(s) were removed.

Index contains 8 files.

Approximate index size: 9,058 bytes.

Shutting down crawling service....Done.

Updating index to disk...Done.

Exiting...