Special Characters are Removed when Loading Greenplum with Informatica
search cancel

Special Characters are Removed when Loading Greenplum with Informatica

book

Article ID: 295980

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Symptoms:

When loading a table in Greenplum from Informatica, the source data will have special characters but the Greenplum destination will not have them.

  • For example, ö is replaced by o or é is replaced by e.
  • This affects diacritics such as umlaut, accents, diaeresis, etc.

 

Environment


Cause

Informatica provides several features to remove diacritics from some of their translation tools.  Some of these features are RemoveDiacritics and ASCII_SIMPLIFIED settings which will remove these special characters.

https://kb.informatica.com/whitepapers/4/Pages/1/367890.aspx
"For instance, “ö” is replaced by “o” or “é” is replaced by “e”. SIMPLIFIED never replaces one character with more than one character. It also does not remove any character. That is, the string length always stays the same."

RCA 

Greenplum can easily handle these special characters.  By default, Greenplum uses UTF8 which is a multi-byte encoding able to handle most languages.

Typically, if translating the text is not possible, we would see a "?" character or a character and <code> which is an expected behavior.

Setting client_encoding can switch between different encoding.  See http://gpdb.docs.pivotal.io/4360/guc_config-client_encoding.html for more details.

Latin1 data inserted
gpadmin=# SET client_encoding='LATIN1';
SET
gpadmin=# INSERT INTO encoding_test values ('Ö_latin');
INSERT 1
gpadmin=# SELECT * from encoding_test;
              a
-----------------------------
 Ö_latin
(1 row)

gpadmin=# SET client_encoding='UTF8';
SET
gpadmin=# SELECT * from encoding_test;
                a
----------------------------------
 Ã\u0096_latin
(1 row)
UTF8/Unicode data inserted
gpadmin=# INSERT INTO encoding_test values ('Ö_utf8');
INSERT 1
gpadmin=# SELECT * from encoding_test;
                a
----------------------------------
 Ã\u0096_latin
 Ö_utf8
(2 rows)

gpadmin=# SET client_encoding='LATIN1';
SET
gpadmin=# SELECT * from encoding_test;
              a
-----------------------------
 Ö_latin
 _utf8
(2 rows)

 

Resolution

Turn off Informatica RemoveDiacritics settings to maintain the special characters when loaded into Greenplum.