This article explains how to assign a default value to a column that does not have values on the input file.
The table "test" has a column "c" which has the default value of "0" if no records are assigned to the column.
flightdata=# \d test
Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
a | text |
b | text |
c | integer | default 0
Distributed by: (a)
Load the data. Observe that the last row has NULL value in last column.
[gpadmin@gpdbsne tmp]$ cat gpload.data a|b|1 c|d|2 e|f|3 g|h|Load the data with gpload using the following command:
[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters
VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- gpdbsne
PORT: 8081
FILE:
- /tmp/gpload.data
- COLUMNS:
- a: text
- b: text
- c: int
- FORMAT: text
- DELIMITER: '|'
- NULL_AS: ''
OUTPUT:
- TABLE: public.test
- MODE: insert
[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters
2016-03-09 12:05:05|INFO|gpload session started 2016-03-09 12:05:05
2016-03-09 12:05:05|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30
2016-03-09 12:05:05|INFO|running time: 0.14 seconds
2016-03-09 12:05:06|INFO|rows Inserted = 4
2016-03-09 12:05:06|INFO|rows Updated = 0
2016-03-09 12:05:06|INFO|data formatting errors = 0
2016-03-09 12:05:06|INFO|gpload succeeded
Observe that he column is not populated with the default value:
flightdata=# select * from test; a | b | c ---+---+--- a | b | 1 c | d | 2 e | f | 3 g | h | (4 rows)
flightdata=# truncate table test ; TRUNCATE TABLE flightdata=# \q2. Modify the YAML file (gpload control file) and use the "MAPPING" keyword, to assign values to the column if found.
[gpadmin@gpdbsne tmp]$ cat /tmp/gpload.paramters
VERSION: 1.0.0.1
DATABASE: flightdata
USER: gpadmin
HOST: gpdbsne
PORT: 3014
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- gpdbsne
PORT: 8081
FILE:
- /tmp/gpload.data
- COLUMNS:
- a: text
- b: text
- c: int
- FORMAT: text
- DELIMITER: '|'
- NULL_AS: ''
OUTPUT:
- MAPPING :
a : a
b : b
c : 'case coalesce(c,0) when 0::numeric then 0::numeric else c end'
- TABLE: public.test
- MODE: insert
3. Execute the gpload command:
[gpadmin@gpdbsne tmp]$ gpload -f /tmp/gpload.paramters 2016-03-09 12:07:43|INFO|gpload session started 2016-03-09 12:07:43 2016-03-09 12:07:43|INFO|started gpfdist -p 8081 -P 8082 -f "/tmp/gpload.data" -t 30 2016-03-09 12:07:43|INFO|running time: 0.20 seconds 2016-03-09 12:07:43|INFO|rows Inserted = 4 2016-03-09 12:07:43|INFO|rows Updated = 0 2016-03-09 12:07:43|INFO|data formatting errors = 0 2016-03-09 12:07:43|INFO|gpload succeeded [gpadmin@gpdbsne tmp]$ psql psql (8.2.15) Type "help" for help.4. The data loaded is now formatted correctly, as shown below:
flightdata=# select * from test; a | b | c ---+---+--- a | b | 1 c | d | 2 e | f | 3 g | h | 0 (4 rows)