There is a known issue with the snappy package where it uses the tmp directory to execute certain scripts. If the tmp directory is set with the noexec option, the snappy tool will encounter numerous errors and fail.
ERROR: ERROR: PXF server error(500) : Could not initialize class org.xerial.snappy.Snappy (seg1 slice1 xxx.xxx.xxx.xxx:45001 pid=87316)
DETAIL: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279)
at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:230)
at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:239)
at org.apache.parquet.column.impl.ColumnReaderBase.readPageV1(ColumnReaderBase.java:650)
at org.apache.parquet.column.impl.ColumnReaderBase.access$300(ColumnReaderBase.java:57)
at org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:593)
at org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:590)
at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:120)
at org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:590)
at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:564)
at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.readNextObject(ParquetFileAccessor.java:186)
at org.greenplum.pxf.service.bridge.ReadBridge.getNext(ReadBridge.java:86)
at org.greenplum.pxf.service.controller.ReadServiceImpl.processFragment(ReadServiceImpl.java:157)
at org.greenplum.pxf.service.controller.ReadServiceImpl.writeStream(ReadServiceImpl.java:101)
at org.greenplum.pxf.service.controller.ReadServiceImpl.lambda$null$0(ReadServiceImpl.java:58)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918)
at org.greenplum.pxf.service.security.BaseSecurityService.doAs(BaseSecurityService.java:122)
at org.greenplum.pxf.service.controller.BaseServiceImpl.processData(BaseServiceImpl.java:74)
at org.greenplum.pxf.service.controller.ReadServiceImpl.lambda$readData$1(ReadServiceImpl.java:58)
at org.greenplum.pxf.service.controller.PxfErrorReporter.invokeWithErrorHandling(PxfErrorReporter.java:28)
at org.greenplum.pxf.service.controller.ReadServiceImpl.readData(ReadServiceImpl.java:58)
at org.greenplum.pxf.service.rest.PxfReadResource.lambda$produceResponse$0(PxfReadResource.java:53)
at org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBodyReturnValueHandler$StreamingResponseBodyTask.call(StreamingResponseBodyReturnValueHandler.java:111)
at org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBodyReturnValueHandler$StreamingResponseBodyTask.call(StreamingResponseBodyReturnValueHandler.java:98)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startCallableProcessing$4(WebAsyncManager.java:355)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.greenplum.pxf.service.spring.PxfContextMdcLogEnhancerDecorator.lambda$decorate$0(PxfContextMdcLogEnhancerDecorator.java:27)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
HINT: Check the PXF logs located in the '/usr/local/pxf-gp6/logs' directory on host 'localhost' or 'set client_min_messages=LOG' for additional details.
CONTEXT: External table pxf_tbl_parquet_read (seg1 slice1 xxx.xxx.xxx.xxx:45001 pid=12345)
DETAIL: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279)
at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:230)
at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:239)
at org.apache.parquet.column.impl.ColumnReaderBase.readPageV1(ColumnReaderBase.java:650)
at org.apache.parquet.column.impl.ColumnReaderBase.access$300(ColumnReaderBase.java:57)
at org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:593)
at org.apache.parquet.column.impl.ColumnReaderBase$3.visit(ColumnReaderBase.java:590)
at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:120)
at org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:590)
at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:564)
at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor.readNextObject(ParquetFileAccessor.java:186)
at org.greenplum.pxf.service.bridge.ReadBridge.getNext(ReadBridge.java:86)
at org.greenplum.pxf.service.controller.ReadServiceImpl.processFragment(ReadServiceImpl.java:157)
at org.greenplum.pxf.service.controller.ReadServiceImpl.writeStream(ReadServiceImpl.java:101)
at org.greenplum.pxf.service.controller.ReadServiceImpl.lambda$null$0(ReadServiceImpl.java:58)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918)
at org.greenplum.pxf.service.security.BaseSecurityService.doAs(BaseSecurityService.java:122)
at org.greenplum.pxf.service.controller.BaseServiceImpl.processData(BaseServiceImpl.java:74)
at org.greenplum.pxf.service.controller.ReadServiceImpl.lambda$readData$1(ReadServiceImpl.java:58)
at org.greenplum.pxf.service.controller.PxfErrorReporter.invokeWithErrorHandling(PxfErrorReporter.java:28)
at org.greenplum.pxf.service.controller.ReadServiceImpl.readData(ReadServiceImpl.java:58)
at org.greenplum.pxf.service.rest.PxfReadResource.lambda$produceResponse$0(PxfReadResource.java:53)
at org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBodyReturnValueHandler$StreamingResponseBodyTask.call(StreamingResponseBodyReturnValueHandler.java:111)
at org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBodyReturnValueHandler$StreamingResponseBodyTask.call(StreamingResponseBodyReturnValueHandler.java:98)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startCallableProcessing$4(WebAsyncManager.java:355)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.greenplum.pxf.service.spring.PxfContextMdcLogEnhancerDecorator.lambda$decorate$0(PxfContextMdcLogEnhancerDecorator.java:27)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
HINT: Check the PXF logs located in the '/usr/local/pxf-gp6/logs' directory on host 'localhost' or 'set client_min_messages=LOG' for additional details.
CONTEXT: External table pxf_tbl_parquet_read
PXF Versions:
6.3.2 - 6.10.1
GPDB Versions:
6.x.x
There is a known issue for the snappy package. It will use the tmp directory to execute some scripts, if the tmp directory set as noexec, then snappy tool will fail with lots of errors.
To address this issue, the customer needs to verify that their /tmp directory is set with the noexec option. If the /tmp directory is configured with the nonexec option, they can use a different temporary directory by following the steps outlined below.