How to load JSON data using gpload in Tanzu Greenplum
search cancel

How to load JSON data using gpload in Tanzu Greenplum

book

Article ID: 296478

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Generally, it is not so convenient to load JSON data directly using gpload unless through some third-party tool if any.

This article is to describe an easier way to load JSON data using gpload.

Environment

Product Version: 5.21

Resolution

For example, lets say we have an external table:
create external table public.test_gpload_json(id int, t_j json, t text);

You have data that can be loaded:
[root@mdw data]# cat /tmp/data/gpload_test.text
1|null|{"a":'\t'|1, "b":"fsgfs\\\\sdg"}

gpload the control file:
[root@mdw tmp]# cat /tmp/gpload_json.yaml
---
VERSION: 1.0.0.1
DATABASE: gpadmin
USER: gpadmin
HOST: mdw
PORT: 5432
GPLOAD:
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - mdw
         PORT: 8081
         FILE:
           - /tmp/data/*
    - COLUMNS:
           - id: integer
           - t_j: json
           - t: text
    - FORMAT: text
    - DELIMITER: '|'
    - ESCAPE: E'\t'
    - ERROR_LIMIT: 25
    - LOG_ERRORS: true
   OUTPUT:
    - TABLE: public.test_gpload_json
    - MODE: INSERT
   PRELOAD:
    - REUSE_TABLES: true

Then load data into public.test_gpload_json, and set column t_j as the value got from casting column t as JSON:
gpadmin=# update test_gpload_json set t_j = cast(t as json);
UPDATE 1
gpadmin=# select * from test_gpload_json;
 id |           t_j              |             t              
----+----------------------------+----------------------------
  1 | {"a": 1, "b":"fsgfs\\sdg"} | {"a": 1, "b":"fsgfs\\sdg"}
(1 row)