The Sqoop import job fails to read data from DB2 database which has UTF8 encoding. Essentially, even the data cannot be read at DB2 with select queries as there are some characters which are not in UTF8.
The Sqoop job will throw an error message similar to the one below:
Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265) .. .. Caused by: com.ibm.db2.jcc.am.SqlException: [jcc][t4][1065][12306][4.19.26] Caught java.io.CharConversionException. See attached Throwable for details. ERRORCODE=-4220, SQLSTATE=null at com.ibm.db2.jcc.am.kd.a(Unknown Source) at com.ibm.db2.jcc.am.kd.a(Unknown Source) .. .. Caused by: java.nio.charset.MalformedInputException: Input length = 527 at com.ibm.db2.jcc.am.s.a(Unknown Source) ... 22 more Caused by: sun.io.MalformedInputException at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105) ... 23 more
Refer to this article for the workaround suggested by IBM to avoid this issue.
To summarize, after setting the parameter db2.jcc.charsetDecoderEncoder=3, an exception will not be thrown when a non-UTF8 character is encountered. However, it will be substituted by its equivalent Unicode replacement character.
This property must be passed to the driver. However if you pass it by supplying "-D db2.jcc.charsetDecoderEncoder=3" to the Sqoop import command, this parameter will be appended to the job configuration for the MapReduce job. This is triggered to import the data and will not reach the DB2 driver and the job will fail.
sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2host:13610/database --username xxxxx --password xxxxx --verbose --target-dir xxx/xxxx --query "select col1 from foo.t1 -m 1 --append
In order to use this property, you can include it in the parameters which are used to define the JVM parameters for invoking Java calls.
For instance, you can modify the value of parameter mapreduce.map.java.opts in mapred-site.xml like below:
<property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024m -Ddb2.jcc.charsetDecoderEncoder=3</value> </property>