When loading a table in Greenplum from Informatica, the source data will have special characters but the Greenplum destination will not have them.
Informatica provides several features to remove diacritics from some of their translation tools. Some of these features are RemoveDiacritics and ASCII_SIMPLIFIED settings which will remove these special characters.
https://kb.informatica.com/whitepapers/4/Pages/1/367890.aspx
"For instance, “ö” is replaced by “o” or “é” is replaced by “e”. SIMPLIFIED never replaces one character with more than one character. It also does not remove any character. That is, the string length always stays the same."
RCA
Greenplum can easily handle these special characters. By default, Greenplum uses UTF8 which is a multi-byte encoding able to handle most languages.
Typically, if translating the text is not possible, we would see a "?" character or a character and <code> which is an expected behavior.
Setting client_encoding can switch between different encoding. See http://gpdb.docs.pivotal.io/4360/guc_config-client_encoding.html for more details.
Latin1 data inserted gpadmin=# SET client_encoding='LATIN1'; SET gpadmin=# INSERT INTO encoding_test values ('Ö_latin'); INSERT 1 gpadmin=# SELECT * from encoding_test; a ----------------------------- Ö_latin (1 row) gpadmin=# SET client_encoding='UTF8'; SET gpadmin=# SELECT * from encoding_test; a ---------------------------------- Ã\u0096_latin (1 row) UTF8/Unicode data inserted gpadmin=# INSERT INTO encoding_test values ('Ö_utf8'); INSERT 1 gpadmin=# SELECT * from encoding_test; a ---------------------------------- Ã\u0096_latin Ö_utf8 (2 rows) gpadmin=# SET client_encoding='LATIN1'; SET gpadmin=# SELECT * from encoding_test; a ----------------------------- Ö_latin _utf8 (2 rows)
Turn off Informatica RemoveDiacritics settings to maintain the special characters when loaded into Greenplum.