Troubleshooting a PXF external query in Greenplum
search cancel

Troubleshooting a PXF external query in Greenplum

book

Article ID: 296286

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

This article discusses how to troubleshoot issues involving Greenplum external queries to your remote Hadoop clusters using Platform Extension Framework(PXF).

These troubleshooting steps should be helpful in scenarios where:

  • Similar external queries were previously working.

  • You have recently enabled PXF for Greenplum using the documented procedures and still encounter issues.

First, every Greenplum Segment host must be able to communicate with ALL Hadoop HDFS nodes.

Note: It may be helpful to spend 5 minutes reviewing the NameNodes and DataNodes section of the Hadoop HDFS Architecture Overview.

 

Resolution

Checklist:

PXF: Troubleshooting PXF Hadoop connectors

Let's say you or your end-users have the following PXF external tables:

  • Table 1 PXF: HDFS External Readable Table

CREATE EXTERNAL TABLE pxf_hdfs_tbl_1col (
    id text
) LOCATION (
    'pxf://user/gpadmin/gpdb5_data/pxf_hdfs_1col.txt?PROFILE=HdfsTextSimple'
) ON ALL
FORMAT 'text' (delimiter E'	' null E'\\N' escape E'\\')
OPTIONS (

 )
ENCODING 'UTF8';

 

  • Table 2 PXF: HDFS External Writable Table
CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_tbl_1col_wr (
    id text
) LOCATION (
    'pxf://user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr?PROFILE=HdfsTextSimple'
)
FORMAT 'text' (delimiter E' ' null E'\\N' escape E'\\')
OPTIONS (
 )
ENCODING 'UTF8' DISTRIBUTED RANDOMLY;

 

  • Table 3 PXF: HBase External Table
CREATE EXTERNAL TABLE pxf_hbase_tbl_orderinfo (
    "product:name" character varying,
    "shipping_info:zipcode" integer
) LOCATION (
    'pxf://order_info?PROFILE=HBase'
) ON ALL
FORMAT 'custom' (formatter = 'pxfwritable_import')
OPTIONS (
 )
ENCODING 'UTF8';

 

From every segment host running PXF, you should be able to:

  • Quickly install Hadoop clients to validate the Host Hadoop is set up correctly outside of GPDB or PXF.

  • For example, from the Redhat7 or Centos7 host, you should see the following: 
cd /etc/yum.repos.d 
wget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.2.0/hdp.repo
yum install hadoop-client
yum install hbase

Then run the basic client utilities using the $PXF_CONF configuration. 

For example:

1. List the remote HDFS LOCATION file for Table 1:

$ hdfs --config $PXF_CONF/servers/default dfs -ls /user/gpadmin/gpdb5_data/pxf_hdfs_1col.txt 
-rw-r--r--   3 gpadmin gpadmin         21 2019-04-27 21:16 /user/gpadmin/gpdb5_data/pxf_hdfs_1col.txt
$ 

 

2. Perform the '-cat' command on the contents of the remote HDFS LOCATION file for Table 1:

$ hdfs --config $PXF_CONF/servers/default dfs -cat /user/gpadmin/gpdb5_data/pxf_hdfs_1col.txt
1
2
3
4
5
6
7
8
9
10
$

 

3. Create the directory referenced by the remote HDFS LOCATION for Writable Table 2:

$ hdfs --config $PXF_CONF/servers/default dfs -mkdir /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr

 

4. Alternatively to the previous step, you can also "list" or "cat" the contents of the remote HDFS LOCATION Directory for Writable Table 2:

$ hdfs --config $PXF_CONF/servers/default dfs -ls /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr
Found 4 items
-rw-r--r--   3 gpadmin gpadmin          4 2019-05-01 22:33 /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr/1556750803-0000002994_0
-rw-r--r--   3 gpadmin gpadmin          6 2019-05-01 22:33 /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr/1556750803-0000002994_1
-rw-r--r--   3 gpadmin gpadmin          7 2019-05-01 22:33 /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr/1556750803-0000002994_2
-rw-r--r--   3 gpadmin gpadmin          4 2019-05-01 22:33 /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr/1556750803-0000002994_3
$

 

$ hdfs --config $PXF_CONF/servers/default dfs -cat /user/gpadmin/gpdb5_data/pxf_hdfs_tbl_1col_wr/1556750803-0000002994_0
4
8
$ 
  1. Note: For more information on the hdfs CLI, refer to the HDFSCommands reference.


5. Check the HBase table referenced in LOCATION URI for Table 3 by connecting through the HBase client shell from the segment host. If successful, you will:

  • Receive the following HBase CLI prompt:
hbase --config $PXF_CONF/servers/default shell
...
hbase(main):001:0>
  • Be able to list the HBase table in the LOCATION URI:
hbase(main):002:0> list "order_info"
TABLE
order_info
1 row(s) in 0.0120 seconds

=> ["order_info"]
hbase(main):003:0>
  • Be able to do a HBase table scan:
hbase(main):006:0> scan "order_info"
ROW                                                                     COLUMN+CELL
 1                                                                      column=product:location, timestamp=1533511907001, value=out of stock
 1                                                                      column=product:name, timestamp=1533511906943, value=tennis racquet
 1                                                                      column=shipping_info:state, timestamp=1533511907058, value=CA
 1                                                                      column=shipping_info:zipcode, timestamp=1533511907127, value=12345
 2                                                                      column=product:location, timestamp=1533511907185, value=on floor
 2                                                                      column=product:name, timestamp=1533511907153, value=soccer ball
 2                                                                      column=shipping_info:state, timestamp=1533511907216, value=CO
 2                                                                      column=shipping_info:zipcode, timestamp=1533511907239, value=56789
 3                                                                      column=product:location, timestamp=1533511907296, value=warehouse
 3                                                                      column=product:name, timestamp=1533511907266, value=snorkel set
 3                                                                      column=shipping_info:state, timestamp=1533511907315, value=OH
 3                                                                      column=shipping_info:zipcode, timestamp=1533511907371, value=34567
3 row(s) in 0.0400 seconds

hbase(main):007:0>

Note: For more information on the HBase shell, refer to the Apache HBase Shell.  

Attachments

GphdfsTest get_app