Cannot access file during IDM indexing (Linux servers)

book

Article ID: 198384

calendar_today

Updated On:

Products

Data Loss Prevention Enforce

Issue/Introduction

Tomcat logs indicate certain files are failing to index with "cannot access file" in the logs.

The logs show "wingding" characters in the file name, which are using Windows-1252 character set characters.  

15 Jun 2020 23:00:06,183- Thread: 85 WARNING [com.vontu.profiles.manager.document.DocumentSourceIndexCreator] Cannot access file:
Cause:
com.vontu.directorycrawler.VontuFileException: /var/Vontu/repo/confdoc/****Engagement_�_FINAL_2.pdf (No such file or directory)
java.io.FileNotFoundException: /var/Vontu/repo/confdoc/****Engagement_�_FINAL_2.pdf (No such file or directory)
java.io.FileNotFoundException: /var/Vontu/repo/confdoc/****Engagement_�_FINAL_2.pdf (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at com.vontu.directorycrawler.VontuFileInputStream.<init>(VontuFileInputStream.java:152)
        at com.vontu.directorycrawler.VontuFileInputStream.<init>(VontuFileInputStream.java:174)
        at com.vontu.directorycrawler.VontuFile.getByteArray(VontuFile.java:1509)
        at com.vontu.directorycrawler.VontuFile.getByteArray(VontuFile.java:1498)
        at com.vontu.directorycrawler.VontuFile.getByteArray(VontuFile.java:1476)
        at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.putDocumentInIndexer(DocumentSourceIndexCreator.java:641)
        at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.doIndex(DocumentSourceIndexCreator.java:415)
        at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.indexInfoSourceOnManager(DocumentSourceIndexCreator.java:309)
        at com.vontu.profiles.manager.InfoSourceIndexCreator.indexListOfDataSources(InfoSourceIndexCreator.java:254)
        at com.vontu.profiles.manager.document.DocumentSourceIndexJob.index(DocumentSourceIndexJob.java:31)
        at com.vontu.profiles.manager.InfoSourceIndexJob.execute(InfoSourceIndexJob.java:75)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
15 Jun 2020 23:00:06,183- Thread: 85 WARNING [com.vontu.profiles.manager.document.DocumentSourceIndexCreator] File name is /apps/Vontu/var/repo/confdoc/*******Engagement_�_FINAL_2.pdf

Cause

When performing IDM indexing on Linux systems, files with non-Unicode characters in the filename may be ignored by the indexer.  This applies to both Enforce running on Linux and to the Remote IDM Indexer running on Linux. 

Resolution

To avoid having files missed by the indexer, make sure that the names of all files to be indexed are encoded with Unicode (UTF-8) characters.