Greenplum Database 6x master logs show 1.1G logged every few minutes where most entries are "LOG","00000","HashJoin: Too many batches computed"
search cancel

Greenplum Database 6x master logs show 1.1G logged every few minutes where most entries are "LOG","00000","HashJoin: Too many batches computed"

book

Article ID: 296340

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Note: This issue is fixed in the Greenplum Database (GPDB) v6.10.0.

In GPDB versions less than v6.10.0, the GPDB master logs show 1.1G files logged every few minutes.

For example:
[gpadmin@mdw ~]$ ls -lrth $MASTER_DATA_DIRECTORY/pg_log | tail
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:52 gpdb-2021-01-12_142811.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:52 gpdb-2021-01-12_142856.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:52 gpdb-2021-01-12_142943.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:52 gpdb-2021-01-12_143032.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:52 gpdb-2021-01-12_143121.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:53 gpdb-2021-01-12_143212.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:53 gpdb-2021-01-12_143317.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:53 gpdb-2021-01-12_143432.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:53 gpdb-2021-01-12_143545.csv
-rw-------. 1 gpadmin gpadmin 1.1G Jan 12 18:53 gpdb-2021-01-12_143701.csv
[gpadmin@mdw ~]$ 

Analysis of the GPDB master logs shows that the vast majority of log entries are of the type "LOG","00000","HashJoin: Too many batches computed".

For example:
[gpadmin@mdw pg_log]$ for I in *.csv ; do echo $I; grep ',con' $I | cut -d ',' -f17 | sort | uniq -c | sort -nr; grep ',con' $I | cut -d ',' -f17,18,19 | sort | uniq -c | sort -nr | head; done
gpdb-2021-01-12_080045.csv
  13847 "LOG"
      5 "WARNING"
      4 "ERROR"
   7527 "LOG","00000","HashJoin: Too many batches computed: nbatch=67108864. gp_workfile_limit_files_per_query=100000
   1803 "LOG","00000","HashJoin: Too many batches computed: nbatch=2097152. gp_workfile_limit_files_per_query=100000
   1012 "LOG","00000","HashJoin: Too many batches computed: nbatch=524288. gp_workfile_limit_files_per_query=100000
    727 "LOG","00000","HashJoin: Too many batches computed: nbatch=262144. gp_workfile_limit_files_per_query=100000
    579 "LOG","00000","HashJoin: Too many batches computed: nbatch=4194304. gp_workfile_limit_files_per_query=100000
    397 "LOG","00000","HashJoin: Too many batches computed: nbatch=65536. gp_workfile_limit_files_per_query=100000
    316 "LOG","00000","HashJoin: Too many batches computed: nbatch=1048576. gp_workfile_limit_files_per_query=100000
    258 "LOG","00000","HashJoin: Too many batches computed: nbatch=8388608. gp_workfile_limit_files_per_query=100000
    251 "LOG","00000","HashJoin: Too many batches computed: nbatch=33554432. gp_workfile_limit_files_per_query=100000
    230 "LOG","00000","HashJoin: Too many batches computed: nbatch=131072. gp_workfile_limit_files_per_query=100000
gpdb-2021-01-12_080322.csv
  14452 "LOG"
   9207 "LOG","00000","HashJoin: Too many batches computed: nbatch=67108864. gp_workfile_limit_files_per_query=100000
   1009 "LOG","00000","HashJoin: Too many batches computed: nbatch=4194304. gp_workfile_limit_files_per_query=100000
    905 "LOG","00000","HashJoin: Too many batches computed: nbatch=2097152. gp_workfile_limit_files_per_query=100000
    805 "LOG","00000","HashJoin: Too many batches computed: nbatch=262144. gp_workfile_limit_files_per_query=100000
    729 "LOG","00000","HashJoin: Too many batches computed: nbatch=524288. gp_workfile_limit_files_per_query=100000
    389 "LOG","00000","HashJoin: Too many batches computed: nbatch=1048576. gp_workfile_limit_files_per_query=100000
    363 "LOG","00000","HashJoin: Too many batches computed: nbatch=33554432. gp_workfile_limit_files_per_query=100000
    267 "LOG","00000","HashJoin: Too many batches computed: nbatch=65536. gp_workfile_limit_files_per_query=100000
    211 "LOG","00000","HashJoin: Too many batches computed: nbatch=131072. gp_workfile_limit_files_per_query=100000
    155 "LOG","00000","HashJoin: Too many batches computed: nbatch=8388608. gp_workfile_limit_files_per_query=100000


Environment

Product Version: 6.5

Resolution

The logging issue is fixed in GPDB v6.10.0. To resolve this issue, we recommended upgrading to the latest GPDB v6 release.

For more information on upgrading to the latest GPDB 6 release as of January, 2022, refer to VMware Tanzu Greenplum 6.x Release Note.

GPDB v6.10.0 code fix summary

The `ExecChooseHashTableSize()` code function is called by the query planner, when calculating cost for join paths. 

The `ExecChooseHashTableSize()` code function logged its calculations to the GPDB master logs as "LOG","00000","HashJoin: Too many batches computed" entries.

This had the potential to greatly increase the size of the GPDB master logs.

The fix was to make these entries only visible in GPDB logs when 'DEBUG1' logging is set.

Note: Default logging will no longer capture the log entries.