ERROR: invalid byte sequence for encoding "UTF8": 0xe97f00" when fetching data from binary data file
search cancel

ERROR: invalid byte sequence for encoding "UTF8": 0xe97f00" when fetching data from binary data file

book

Article ID: 296284

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

The error "invalid byte sequence for encoding "UTF8": 0xe97f00" always happens when the encoding of the source external file is different from that of GUC client_encoding of GPDB, causing the fact that some characters could not be recognized by GPDB. We can find more related details from another KB article

In this case, the charset of the source data file is BINARY:

[gpadmin@smdw load_data]$ file -i BL_NETWORK_NODE_45.dat 
BL_NETWORK_NODE_45.dat: application/octet-stream; charset=binary

The value of the GUC client_encoding is UTF-8:

gpadmin=# show client_encoding
gpadmin-# ;
 client_encoding 
-----------------
 UTF8
(1 row)



Environment

Product Version: 5.2

Resolution

If it is not possible to recreate a source data file with the correct encoding, we can convert the existing data file's encoding using the command "iconv" if the file type itself can be converted using iconv command(Rembmer to backup the original file!). For example:
[gpadmin@smdw load_data]$ iconv -f binary -t UTF8 BL_NETWORK_NODE_45.dat BL_NETWORK_NODE_45_UTF8.dat 
iconv: conversion from `binary' is not supported
Try `iconv --help' or `iconv --usage' for more information.
However in the above example,  the conversion failed because the encoding of the data file is BINARY and iconv does not support the conversion for BINARY.