gpload preload jobs fail after upgrade >= 6.25.2
search cancel

gpload preload jobs fail after upgrade >= 6.25.2

book

Article ID: 383801

calendar_today

Updated On:

Products

VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite

Issue/Introduction

After upgrading to 6.25.2 or above

This gpload ran an insert and truncate and then a  User defined function analyzeDB. The function caused the gplaod to not complete and also hung the analyzeDB. 

 cat gploadyaml.txt
VERSION: 1.0.0.1
DATABASE: admin
HOST: host1.broadcom.net
PORT: 7004
GPLOAD:
    INPUT:
        - SOURCE:
            FILE:
                - /tmp/load.csv        - COLUMNS:
            - id: int
            - age: int        - MAX_LINE_LENGTH: 655350
        - FORMAT: text
        - DELIMITER: ','
        - ESCAPE: 'OFF'
        - NULL_AS: ''
        - QUOTE: '"'
        - HEADER: true    EXTERNAL:
        - SCHEMA: staging    OUTPUT:
        - TABLE: test.table
        - MODE: INSERT    PRELOAD:
        - TRUNCATE: true    SQL:
       
          - AFTER: select schema.analyzedb('test','table'); 

Cause

This is  expected behavior in GPDB >= 6.25.2

Because the default procedure of gpload is that all transactions are executed inside one transactions.

According to customer's config file, there is truncate sql, insert sqls and after sqls which include UDF functions.

The UDF function will start a subprocess to execute the analyze operation which will hang because of the truncate sql.

 

This default behavior was introduced in 6.25.2 to keep all gpload transactions within one transaction. The reason for this was if the gpload failed the after truncate would not rollback. 

Resolution

To have this work as it did prior to the upgrade to 6.25.2 you can  specify `--no_auto_trans` to keep the same affect as before.

From gpload documentation:

Specify --no_auto_trans to deactivate processing the load operation as a single transaction if you are performing a single load operation on the target table.

By default, gpload processes each load operation as a single transaction to prevent inconsistent data when performing multiple, simultaneous operations on a target table.