Hive UADF SUM function returns null

Products

Services Suite

Issue/Introduction

Symptoms:

When executing sum against a decimal column, Hive returns NULL as the value.

hive> select sum(retail_price) from reatail_repro;
Job 0: Map: 10  Reduce: 1   Cumulative CPU: 530.03 sec   HDFS Read: 2600574562 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 8 minutes 50 seconds 30 msec
OK
NULL

However, if you cast the column to double or bigint, the correct value is returned.

hive> select sum(double(retail_price)) from retail_repro;
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 16.48 sec   HDFS Read: 30644180 HDFS Write: 21 SUCCESS
Total MapReduce CPU Time Spent: 16 seconds 480 msec
OK
5.1994505839339055E7

hive> select sum(bigint(retail_price)) from retail_repro;
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 14.31 sec   HDFS Read: 30644180 HDFS Write: 9 SUCCESS
Total MapReduce CPU Time Spent: 14 seconds 310 msec
OK
48073762

Environment

Cause

Observe that some values in the column retail_price are 32 digits. For example, one of the values in the column retail_price is: 0.0000000000000008881784197001252.

Given the MAX PRECISION is 38 digits for a HiveDecimal in v0.12.0, the SUM function will return null if HiveDecimal throws NumberFormatException (NFE). The problem in this case is the mapper process will not log any errors if the merge function receives the NFE. As shown below, just set sum = null and return.

./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java
 public void merge(AggregationBuffer agg, Object partial) throws HiveException {
 .
 .
try {
          myagg.sum = myagg.sum.add(
            PrimitiveObjectInspectorUtils.getHiveDecimal(partial, inputOI));
        } catch (NumberFormatException e) {
          myagg.sum = null;
	}

You will need to insert some debug statements in order to get the stack trace and confirm the exception.

  2014-11-30 22:14:40,654 ERROR [main] org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum: NFE encountered during merge:java.lang.NumberFormatException: Assignment would result in truncation
	at org.apache.hadoop.hive.common.type.HiveDecimal.(HiveDecimal.java:53)
	at org.apache.hadoop.hive.common.type.HiveDecimal.(HiveDecimal.java:47)
	at org.apache.hadoop.hive.common.type.HiveDecimal.add(HiveDecimal.java:178)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumHiveDecimal.merge(GenericUDAFSum.java:159)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumHiveDecimal.iterate(GenericUDAFSum.java:127)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:629)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:826)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:723)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:791)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Resolution

Workaround

Cast the column to double.

Fix

This behavior has changed in hive 0.13.x as per Hive_Decimal_Precision_Scale_Support. By tuning precision and scale in the table definition, you can control how how HiveDecimal rounds off the digits after the decimal.

In addition, the error handling is improved so that the NumberFormatException will be caught and printed to the mapper container syslog.