loading data report error: "unsupported Unicode escape sequence, \u0000 cannot be converted to text."
search cancel

loading data report error: "unsupported Unicode escape sequence, \u0000 cannot be converted to text."

book

Article ID: 296309

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

When loading certain JSON data to GPDB v6 (postgresql v9.4), it would report the following error:
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: {"data":...


Environment

Product Version: 6.0

Resolution

This is due to design change in postgresql v9.4 (GPDB v6). There is a postgresql community post which talked about this change, a quote was selected from the community post, pasted below:

Source:
https://www.postgresql.org/message-id/8239.1500489054%40sss.pgh.pa.us

Matthew Byrne writes: Are there any plans to support \u0000 in JSONB and, relatedly, UTF code > point 0 in TEXT?

No. It's basically never going to happen because of the widespread use of C strings (nul-terminated strings) inside the backend. Making \0 a legal member of strings would break all those internal APIs, requiring touching far more code than anyone would want to do. It'd likely break a great deal of client-side code as well.

 
The GPDB R&D team also confirmed, even in postgresql v12, this usage will report the same error. The R&D team has no plans to change this design.

However, we do have a workaround, listed below:
SELECT (regexp_replace(the_string::text, '
u0000', '', 'g'))::json;