Rowcount mismatch between source and target(GP permanent table) when PXF ingest from ORC files hosted on NFS storage
search cancel

Rowcount mismatch between source and target(GP permanent table) when PXF ingest from ORC files hosted on NFS storage

book

Article ID: 423924

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

You are trying to ingest data from ORC files hosted on Network File System(NFS) storage and noticed that rowcount between source ORC files on the NFS do not match the rowcount of the permanent greenplum table data is ingested into.

Cause

PXF do not have a awareness of if there are duplicate files on some of the segments and result into reading unexpected files from the actual list leading to missing or duplicate rows ingested into the target table.

Resolution

Workaround - 

This is observed to be a intermittent issue and successfully completes ingestion with actual data resulting into rowcount match between the source and the target.

 

Permanent Solution -

PXF 6.11.4 will have the fix so that duplicate files will be removed from the lists by the segments if observed.