Set proper encoding on Detection.EncodingGuessingDefaultEncoding


Article ID: 160315


Updated On:


Data Loss Prevention Enforce


Detection.EncodingGuessingDefaultEncoding is set for a default language detection; some content detection is failing for that language.


Some languages have multiple encoding types.  In general, one of the encoding names is rarely used, but it sometimes does not include platform dependent characters (Microsoft Windows, Mac OS, UNIX, Mainframe, etc.).

For example, in Japanese Shift-JIS doesn't include such characters (Microsoft Windows, IBM, NEC), Windows-31J includes them, but even Windows-31J doesn't include Mac, Fujitsu etc. vendor's special such characters.
In Chinese GB2312 is superseded by GBK and GB18030.

If content detection fails,  check if the file contains such platform dependent characters or not. Detection may fail to recognize on these characters.
Extended encoding like mentioned above might include such platform dependent characters.  Please try to set a proper encoding name at Detection.EncodingGuessingDefaultEncoding in such case.
(Detection.EncodingGuessingDefaultEncoding accepts encoding name that is used in Java (JDK).)