PXF jobs failing to read/write on a NFS mount on NAS device
search cancel

PXF jobs failing to read/write on a NFS mount on NAS device

book

Article ID: 378521

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite VMware Tanzu Data Suite Greenplum

Issue/Introduction

When using PXF with writable external tables to write data to Network Attached Storage (NAS) using NFS protocol or smaller jobs may complete successfully, but there is a significant decline in performance for larger tables, resulting in considerably longer processing times or jobs may fail with below traces.

2025-04-18 09:58:36.304 IST ERROR [174373-0000000002:pxf_abc:0  ] 231749 --- [88-exec-11] o.g.p.s.b.WriteBridge                    : Failed to close bridge resources: Invalid argument
2025-04-18 09:58:36.305 IST  INFO [174373-0000000002:pxf_abc:0  ] 231749 --- [88-exec-11] o.g.p.s.c.BaseServiceImpl                : Failed write operation [2387 ms, 3 records, 1.26 records/sec, 400 bytes, 167.57 bytes/sec] for null
2025-04-18 09:58:36.305 IST ERROR [174373-0000000002:pxf_abc:0  ] 231749 --- [88-exec-11] o.g.p.s.c.PxfErrorReporter               : Invalid argument

java.io.IOException: Invalid argument
        at java.io.FileOutputStream.close0(Native Method) ~[?:1.8.0_432]
        at java.io.FileOutputStream.access$000(FileOutputStream.java:53) ~[?:1.8.0_432]
        at java.io.FileOutputStream$1.close(FileOutputStream.java:356) ~[?:1.8.0_432]
        at java.io.FileDescriptor.closeAll(FileDescriptor.java:212) ~[?:1.8.0_432]
        at java.io.FileOutputStream.close(FileOutputStream.java:354) ~[?:1.8.0_432]
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.close(RawLocalFileSystem.java:250) ~[hadoop-common-2.10.2.jar!/:?]
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159) ~[?:1.8.0_432]
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.10.2.jar!/:?]
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.10.2.jar!/:?]
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:414) ~[hadoop-common-2.10.2.jar!/:?]
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.10.2.jar!/:?]
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.10.2.jar!/:?]
        at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:64) ~[parquet-hadoop-1.11.1.jar!/:1.11.1]
        at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:865) ~[parquet-hadoop-1.11.1.jar!/:1.11.1]
        at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:122) ~[parquet-hadoop-1.11.1.jar!/:1.11.1]
        at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165) ~[parquet-hadoop-1.11.1.jar!/:1.11.1]
        at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.closeForWrite(ParquetFileAccessor.java:287) ~[pxf-hdfs-6.8.0.jar!/:?]
        at org.greenplum.pxf.service.bridge.WriteBridge.endIteration(WriteBridge.java:105) ~[classes!/:6.8.0]
        at org.greenplum.pxf.service.controller.WriteServiceImpl.readStream(WriteServiceImpl.java:74) ~[classes!/:6.8.0]
        at org.greenplum.pxf.service.controller.WriteServiceImpl.lambda$writeData$0(WriteServiceImpl.java:40) ~[classes!/:6.8.0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_432]
        at javax.security.auth.Subject.doAs(Subject.java:360) ~[?:1.8.0_432]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) ~[hadoop-common-2.10.2.jar!/:?]
        at org.greenplum.pxf.service.security.BaseSecurityService.doAs(BaseSecurityService.java:122) ~[classes!/:6.8.0]
        at org.greenplum.pxf.service.controller.BaseServiceImpl.processData(BaseServiceImpl.java:74) ~[classes!/:6.8.0]
        at org.greenplum.pxf.service.controller.WriteServiceImpl.writeData(WriteServiceImpl.java:40) ~[classes!/:6.8.0]
        at org.greenplum.pxf.service.rest.PxfWriteResource.produceResponse(PxfWriteResource.java:52) ~[classes!/:6.8.0]
        at org.greenplum.pxf.service.rest.PxfWriteResource.produceResponse(PxfWriteResource.java:19) ~[classes!/:6.8.0]

Environment

GPDB 6.27.1 writing to a NAS using PXF 6.8.0

Cause

Cause 01 - Out of space

The space required to copy is more than the space of the files alone. There is extra space needed for temp files during the copy. The error reported is listed below.

The log below reports that the writes were completing successfully and then it started to fail with the error. This never failed outright, it was just extremely slow. 

2024-09-28 19:12:27.698 IST  INFO [881322-0000000002:host01:263] 44196 --- [88-exec-15] o.g.p.s.c.BaseServiceImpl                : Completed write operation [140 ms, 1 record, 7.14 records/sec, 2061 bytes, 14721.43 bytes/sec]
2024-09-28 20:08:20.868 IST ERROR [885444-0000000002:host02:260] 44196 --- [888-exec-4] o.g.p.s.b.WriteBridge                    : Failed to close bridge resources: Invalid argument
2024-09-28 20:08:20.868 IST ERROR [885444-0000000002:host02:262] 44196 --- [88-exec-11] o.g.p.s.b.WriteBridge                    : Failed to close bridge resources: Invalid argument
2024-09-28 20:08:20.869 IST  INFO [885444-0000000002:host02:260] 44196 --- [888-exec-4] o.g.p.s.c.BaseServiceImpl                : Failed write operation [390851 ms, 1 record, 0.00 records/sec, 343 bytes, 0.88 bytes/sec] for null
2024-09-28 20:08:20.869 IST  INFO [885444-0000000002:host02:262] 44196 --- [88-exec-11] o.g.p.s.c.BaseServiceImpl                : Failed write operation [390851 ms, 1 record, 0.00 records/sec, 321 bytes, 0.82 bytes/sec] for null
2024-09-28 20:08:20.869 IST ERROR [885444-0000000002:host02:262] 44196 --- [88-exec-11] o.g.p.s.c.PxfErrorReporter               : Invalid argument
2024-09-28 20:08:20.869 IST  INFO [885444-0000000002:host02:260] 44196 --- [888-exec-4] o.g.p.s.c.BaseServiceImpl                : Failed write operation [390851 ms, 1 record, 0.00 records/sec, 343 bytes, 0.88 bytes/sec] for null
2024-09-28 20:08:20.869 IST  INFO [885444-0000000002:host02:262] 44196 --- [88-exec-11] o.g.p.s.c.BaseServiceImpl                : Failed write operation [390851 ms, 1 record, 0.00 records/sec, 321 bytes, 0.82 bytes/sec] for null
2024-09-28 20:08:20.869 IST ERROR [885444-0000000002:host02:262] 44196 --- [88-exec-11] o.g.p.s.c.PxfErrorReporter               : Invalid argument
java.io.IOException: Invalid argument
        at java.io.FileOutputStream.close0(Native Method) ~[?:1.8.0_422]
        at java.io.FileOutputStream.access$000(FileOutputStream.java:53) ~[?:1.8.0_422]
        at java.io.FileOutputStream$1.close(FileOutputStream.java:356) ~[?:1.8.0_422]
        at java.io.FileDescriptor.closeAll(FileDescriptor.java:212) ~[?:1.8.0_422]
        at java.io.FileOutputStream.close(FileOutputStream.java:354) ~[?:1.8.0_422]
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.close(RawLocalFileSystem.java:250) ~[hadoop-common-2.10.2.jar!/:?]
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159) ~[?:1.8.0_422]

Cause 02 - Incompatible NFS versions for mount point configured on the Greenplum Cluster hosts.

 

NFS version of mount <mountpointname> 

[gpadmin@host1 ~]$ gpssh -f hostfile -e "mount" |grep cdpabc

[host1] 172.20.20.160:/mountpoint on /mountpoint type nfs4 (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.20.0.20,local_lock=none,addr=172.20.20.160)

[host2] 172.20.20.160:/mountpoint on /mountpoint type nfs4 (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.20.0.21,local_lock=none,addr=172.20.20.160)

[hostm] 172.20.20.160:/mountpoint on /mountpoint type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.20.20.160,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=172.20.20.160)

Resolution

Resolution for Cause 01 -

Free up more space on the storage  or

Use a different NAS with more free space

Resolution for Cause 02 -
 
Update the NFS mounts to compatible versions.