Tomcat logs indicate certain files are failing to index with "cannot access file" in the logs.
The logs show "wingding" characters in the file name, which are using Windows-1252 character set characters.
15 Jun 2020 23:00:06,183- Thread: 85 WARNING [com.vontu.profiles.manager.document.DocumentSourceIndexCreator] Cannot access file:
Cause:
com.vontu.directorycrawler.VontuFileException: /var/Vontu/repo/confdoc/****Engagement_�_FINAL_2.pdf (No such file or directory)
java.io.FileNotFoundException: /var/Vontu/repo/confdoc/****Engagement_�_FINAL_2.pdf (No such file or directory)
java.io.FileNotFoundException: /var/Vontu/repo/confdoc/****Engagement_�_FINAL_2.pdf (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at com.vontu.directorycrawler.VontuFileInputStream.<init>(VontuFileInputStream.java:152)
at com.vontu.directorycrawler.VontuFileInputStream.<init>(VontuFileInputStream.java:174)
at com.vontu.directorycrawler.VontuFile.getByteArray(VontuFile.java:1509)
at com.vontu.directorycrawler.VontuFile.getByteArray(VontuFile.java:1498)
at com.vontu.directorycrawler.VontuFile.getByteArray(VontuFile.java:1476)
at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.putDocumentInIndexer(DocumentSourceIndexCreator.java:641)
at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.doIndex(DocumentSourceIndexCreator.java:415)
at com.vontu.profiles.manager.document.DocumentSourceIndexCreator.indexInfoSourceOnManager(DocumentSourceIndexCreator.java:309)
at com.vontu.profiles.manager.InfoSourceIndexCreator.indexListOfDataSources(InfoSourceIndexCreator.java:254)
at com.vontu.profiles.manager.document.DocumentSourceIndexJob.index(DocumentSourceIndexJob.java:31)
at com.vontu.profiles.manager.InfoSourceIndexJob.execute(InfoSourceIndexJob.java:75)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
15 Jun 2020 23:00:06,183- Thread: 85 WARNING [com.vontu.profiles.manager.document.DocumentSourceIndexCreator] File name is /apps/Vontu/var/repo/confdoc/*******Engagement_�_FINAL_2.pdf
When performing IDM indexing on Linux systems, files with non-Unicode characters in the filename may be ignored by the indexer. This applies to both Enforce running on Linux and to the Remote IDM Indexer running on Linux.
To avoid having files missed by the indexer, make sure that the names of all files to be indexed are encoded with Unicode (UTF-8) characters.