In case 35867787, the customer reported a number of PXF ERROR: invalid chunk header when writing to Amazon S3 external storage.
Error received by the client connection:
ERROR - Failed to execute job 57170903 for task udw_writable_ext_table (failed sending to remote component '127.0.0.1:5888' (libchurl.c:649) (seg10 10.125.73.46:27182 pid=85352) (cdbdisp.c:254)
Sample error in pxf-service.log
2024-10-22 11:11:26.0058 ERROR tomcat-http--6 org.greenplum.pxf.service.rest.WritableResource - Exception: totalWritten so far 0 to e
cd-udw-adhoc-dev/source/sync_monthly_realtime_subscriptions_1_prt__202410.csv
java.io.IOException: Invalid chunk header
at org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:615)
at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:192)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:316)
at org.apache.coyote.Request.doRead(Request.java:442)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
at org.apache.tomcat.util.buf.ByteChunk.checkEof(ByteChunk.java:431)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:413)
at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:200)
at java.io.DataInputStream.read(DataInputStream.java:100)
GPDB: 5.30.0
PXF: 5.16
Storage: Amazon S3
Amazon S3 storage throttled the connection due to amount of data being written.
To resolve this issue, customer added fs.s3a.multipart.size property to the s3-site.xml to break the data into smaller chunks.
Reference: