How to assign default values to a column during gpload when the input file has no data for the column
search cancel

How to assign default values to a column during gpload when the input file has no data for the column

book

Article ID: 296209

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This article explains how to assign a default value to a column that does not have values on the input file.


Environment


Cause

When attempting to load data, with gpload, on a table that has default values for the columns, the columns do not get updated with the default values. An example of this behavior is shown below:

 

The table "test" has a column "c" which has the default value of "0" if no records are assigned to the column.
flightdata=# \d test 
     Table "public.test"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | text    | 
 b      | text    | 
 c      | integer | default 0
Distributed by: (a)
Load the data. Observe that the last row has NULL value in last column.
[gpadmin@gpdbsne tmp]$ cat gpload.data 
a|b|1
c|d|2
e|f|3
g|h|
Load the data with gpload using the following command: 
[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters
VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD: 
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - gpdbsne
         PORT: 8081
         FILE:
           - /tmp/gpload.data
    - COLUMNS:
           - a: text
           - b: text
           - c: int
    - FORMAT: text
    - DELIMITER: '|'
    - NULL_AS: ''
   OUTPUT:
    - TABLE: public.test
    - MODE: insert

[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 
2016-03-09 12:05:05|INFO|gpload session started 2016-03-09 12:05:05
2016-03-09 12:05:05|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:05:05|INFO|running time: 0.14 seconds
2016-03-09 12:05:06|INFO|rows Inserted          = 4
2016-03-09 12:05:06|INFO|rows Updated           = 0
2016-03-09 12:05:06|INFO|data formatting errors = 0
2016-03-09 12:05:06|INFO|gpload succeeded
Observe that he column is not populated with the default value:
flightdata=# select * from test;
 a | b | c 
---+---+---
 a | b | 1
 c | d | 2
 e | f | 3
 g | h |  
(4 rows)

Resolution

1. Truncate the table (to remove the old data):
flightdata=# truncate table test ;
TRUNCATE TABLE
flightdata=# \q
2. Modify the YAML file (gpload control file) and use the "MAPPING" keyword, to assign values to the column if found.

Note: Assign the column 'c' if found NULL, then assign "0" as the default:
[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters

VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD: 
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - gpdbsne
         PORT: 8081
         FILE:
           - /tmp/gpload.data
    - COLUMNS:
           - a: text
           - b: text
           - c: int
    - FORMAT: text
    - DELIMITER: '|'
    - NULL_AS: ''
   OUTPUT:
    - MAPPING :
        a : a
        b : b
        c : 'case coalesce(c,0) when 0::numeric then 0::numeric else c end'
    - TABLE: public.test
    - MODE: insert
3. Execute the gpload command:
[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 
2016-03-09 12:07:43|INFO|gpload session started 2016-03-09 12:07:43
2016-03-09 12:07:43|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:07:43|INFO|running time: 0.20 seconds
2016-03-09 12:07:43|INFO|rows Inserted          = 4
2016-03-09 12:07:43|INFO|rows Updated           = 0
2016-03-09 12:07:43|INFO|data formatting errors = 0
2016-03-09 12:07:43|INFO|gpload succeeded
[gpadmin@gpdbsne tmp]$ psql
psql (8.2.15)
Type "help" for help.
4. The data loaded is now formatted correctly, as shown below: 
flightdata=# select * from test;
 a | b | c 
---+---+---
 a | b | 1
 c | d | 2
 e | f | 3
 g | h | 0
(4 rows)