IDM against .zip file with filename containing non-ASCII characters fails

book

Article ID: 159794

calendar_today

Updated On:

Products

Data Loss Prevention Endpoint Prevent Data Loss Prevention Network Monitor Data Loss Prevention Network Prevent for Email Data Loss Prevention Enforce Data Loss Prevention Network Prevent for Web Data Loss Prevention Network Protect Data Loss Prevention Endpoint Discover

Issue/Introduction

When attempting to create an IDM against a .ZIP file that contains a document with a filename containing non-ASCII characters (such as Chinese (Traditional or Simplified), Japanese, Korean, French, German, etc.), it fails with the error:

Error: Indexing was unsuccessful. Check the log files for details (Could not create version x).

Resolution

The localhost.log file will show an error similar to the following:

 

SEVERE [com.vontu.profiles.manager.InfoSourceIndexCreator] Error during document indexing
Cause:
com.vontu.profiles.common.ProfilesException: Cannot extract zip file C:\Vontu\Protect\documentprofiles\chinese.zip to C:\Vontu\Protect\documentprofiles\chinese-522
com.vontu.profiles.common.ProfilesException: Cannot extract zip file C:\Vontu\Protect\documentprofiles\chinese.zip to C:\Vontu\Protect\documentprofiles\chinese-522
 at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.unpackZip(DocumentSourceIndexCreator.java:621)
 at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.createCrawler(DocumentSourceIndexCreator.java:601)
 at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.doIndex(DocumentSourceIndexCreator.java:143)
 at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.indexInfoSourceOnManager(DocumentSourceIndexCreator.java:118)
 at com.vontu.profiles.manager.InfoSourceIndexCreator.indexListOfDataSources(InfoSourceIndexCreator.java:138)
 at com.vontu.profiles.manager.document.DocumentSourceIndexJob.index(DocumentSourceIndexJob.java:15)
 at com.vontu.profiles.manager.InfoSourceIndexJob.execute(InfoSourceIndexJob.java:52)
 at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
 at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:543)
04 Jan 2009 13:16:25,218- Thread: 14 SEVERE [com.vontu.profiles.manager.InfoSourceIndexCreator] Protect Error 1001: Unexpected indexing error occurred.

 

RESOLUTION:

The workaround for this issue is to use the shared directory method for indexing the files for IDMs with filenames containing non-ASCII characters instead of uploading the files via the Browse button.  Using the "Use Remote SMB Share" option to point to the file you wish to index should allow IDMs to be created regardless of the filename language.

STATUS:

This issue is resolved in v11.0+

See:

1498957 IDM indexing fails when the document archive (zip) contains non-ascii filenames in encodings other that UTF-8
1499768 Users need to be warned against uploading ZIP archive containing Non- ASCII filenames for IDM indexing.

Other Causes:

This same error can also be seen when using Winzip11.  See TECH221660: "Error when indexing documents zipped with Winzip 11".