When accessing an external table using PXF, Greenplum reports "ERROR: PXF server error(500) : java.lang.ArrayIndexOutOfBoundsException"
The PXF logĀ reports:
2023-02-10 11:20:36.138 EST ERROR [124487-0000000008:dbname_dbuser:8 ] 763588 --- [88-exec-10] o.g.p.s.c.PxfErrorReporter : 89
java.lang.ArrayIndexOutOfBoundsException: 89
at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.getTypeForColumnDescriptor(ParquetFileAccessor.java:557) ~[pxf-hdfs-6.5.0.jar!/:?]
at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.lambda$generateParquetSchema$2(ParquetFileAccessor.java:499) ~[pxf-hdfs-6.5.0.jar!/:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_192]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) ~[?:1.8.0_192]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_192]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_192]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_192]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_192]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_192]
at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.generateParquetSchema(ParquetFileAccessor.java:500) ~[pxf-hdfs-6.5.0.jar!/:?]
at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.openForWrite(ParquetFileAccessor.java:251) ~[pxf-hdfs-6.5.0.jar!/:?]
at org.greenplum.pxf.service.bridge.WriteBridge.lambda$beginIteration$0(WriteBridge.java:60) ~[classes!/:6.5.0]
at org.greenplum.pxf.service.utilities.GSSFailureHandler.execute(GSSFailureHandler.java:64) ~[classes!/:6.5.0]
at org.greenplum.pxf.service.bridge.WriteBridge.beginIteration(WriteBridge.java:60) ~[classes!/:6.5.0]
at org.greenplum.pxf.service.controller.WriteServiceImpl.readStream(WriteServiceImpl.java:66) ~[classes!/:6.5.0]
at org.greenplum.pxf.service.controller.WriteServiceImpl.lambda$writeData$0(WriteServiceImpl.java:40) ~[classes!/:6.5.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_192]
:
:
:
Cause:
One of the columns in the external table had a type
numeric with a very high precision.
The current logic in PXF takes into consideration Apache Hive limitations on the datatypes. In Hive, DECIMAL / NUMERIC types have the maximum precision of 38. By default the precision of Parquet NUMERIC type will also be 38, but if a Greenplum table NUMERIC column defines its own precision, that precision will be used instead.
However, PXF assumes the precision will not be larger than max of 38.