Sqoop import job from DB2 fails due to invalid characters "java.io.CharConversionException"
search cancel

Sqoop import job from DB2 fails due to invalid characters "java.io.CharConversionException"

book

Article ID: 294784

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

The Sqoop import job fails to read data from DB2 database which has UTF8 encoding. Essentially, even the data cannot be read at DB2 with select queries as there are some characters which are not in UTF8.


The Sqoop job will throw an error message similar to the one below:

Error: java.io.IOException: SQLException in nextKeyValue
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
..
..
Caused by: com.ibm.db2.jcc.am.SqlException: [jcc][t4][1065][12306][4.19.26] Caught java.io.CharConversionException.  See attached Throwable for details. ERRORCODE=-4220, SQLSTATE=null
        at com.ibm.db2.jcc.am.kd.a(Unknown Source)
        at com.ibm.db2.jcc.am.kd.a(Unknown Source)
..
..
Caused by: java.nio.charset.MalformedInputException: Input length = 527
        at com.ibm.db2.jcc.am.s.a(Unknown Source)
        ... 22 more
Caused 
by: sun.io.MalformedInputException
        at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105)
        ... 23 more

Environment


Resolution

Workaround

Refer to this article for the workaround suggested by IBM to avoid this issue.


To summarize, after setting the parameter db2.jcc.charsetDecoderEncoder=3, an exception will not be thrown when a non-UTF8 character is encountered. However, it will be substituted by its equivalent Unicode replacement character.


How to use it with Sqoop or similar applications

This property must be passed to the driver. However if you pass it by supplying "-D db2.jcc.charsetDecoderEncoder=3" to the Sqoop import command, this parameter will be appended to the job configuration for the MapReduce job. This is triggered to import the data and will not reach the DB2 driver and the job will fail. 


Incorrect way

sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2host:13610/database --username xxxxx --password xxxxx --verbose --target-dir xxx/xxxx --query "select col1 from foo.t1 -m 1 --append

In order to use this property, you can include it in the parameters which are used to define the JVM parameters for invoking Java calls.

For instance, you can modify the value of parameter mapreduce.map.java.opts in mapred-site.xml like below:

<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m -Ddb2.jcc.charsetDecoderEncoder=3</value>
</property>