Pivotal HDB query becomes hung when reading large parquet file(s) through PXF service
search cancel

Pivotal HDB query becomes hung when reading large parquet file(s) through PXF service

book

Article ID: 294598

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

Error Message:

/var/gphd/pxf/pxf-service/logs/catalina.out
 

Exception in thread "tomcat-http--7" java.lang.OutOfMemoryError: Java heap space
 at parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599)
 at parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
 at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
 at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
 at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
 at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
 at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
 at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
 at com.pivotal.pxf.plugins.hive.HiveAccessor.getReader(HiveAccessor.java:94)
 at com.pivotal.pxf.plugins.hdfs.HdfsSplittableDataAccessor.getNextSplit(HdfsSplittableDataAccessor.java:87)
 at com.pivotal.pxf.plugins.hdfs.HdfsSplittableDataAccessor.openForRead(HdfsSplittableDataAccessor.java:61)
 at com.pivotal.pxf.plugins.hive.HiveAccessor.openForRead(HiveAccessor.java:83)
 at com.pivotal.pxf.service.ReadBridge.beginIteration(ReadBridge.java:50)
 at com.pivotal.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:100)
:

Environment


Cause

The parquet files are compressed when it is expanded on reading, therefore PXF service requires much more memory than initially it anticipated. When dealing with many large parquet files, the default Java Heap Space for PXF service (512MB) is not enough for processing them.

Resolution

On ALL PXF service nodes, increase the Java Heap Space (-Xmx) for the PXF service by editing /var/gphd/pxf/pxf-service/bin/setenv.sh as below and then restart the PXF service.
# vi /var/gphd/pxf/pxf-service/bin/setenv.sh
JAVA_HOME="/usr/java/default"
AGENT_PATHS=""
JAVA_AGENTS=""
JAVA_LIBRARY_PATH=""
JVM_OPTS="-Xmx1024M -Xss256K"
JAVA_OPTS="$JVM_OPTS $AGENT_PATHS $JAVA_AGENTS $JAVA_LIBRARY_PATH"
# service pxf-service restart
The 1024M here is not absolute value for all situations. Depending on the number of parquet files and their sizes, this value should be tuned accordingly.