Master process restart due to "stats sender process" crashed
search cancel

Master process restart due to "stats sender process" crashed

book

Article ID: 296561

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

When running Greenplum Database (GPDB) with Greenplum Command Center (GPCC) enabled, sometimes we may find the master process (if GPDB crashed) returning the error below in pg_log:
2020-09-07 17:21:40.908121 CST,,,p152741,th-2105387232,,,,0,,,seg-1,,,,,"LOG","00000","stats sender process (PID 152752) was terminated by signal 6: Aborted",,,,,,,0,,"postmaster.c",5620,
2020-09-07 17:21:40.908165 CST,,,p152741,th-2105387232,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 152752) was terminated by signal 6: Aborted",,,,,,,0,,"postmaster.c",5620,
The error indicates that the "stats sender process" was crashed which caused the server process to crash as well.

In the logs of master (under $MASTER_DATA_DIRECTORY/pg_log), we can also observe the following error from GPCC:

Metrics collector reported packet buffer size exceeds limit:
2020-09-07 17:21:40.309183 CST,,,p152752,th-2105387232,,,,0,con4,,seg-1,,,,,"LOG","00000","Metrics collector: packet buffer #11 size exceeds limit 81101 >= 65496",,,,,,,0,,"metrics_collector.c",220,

Stats sender reported "double free or corruption" error:
$ cat master.gpdb-2020-09-07_164134.csv | grep postgres:
*** glibc detected *** postgres:  5432, stats sender process   : double free or corruption (!prev): 0x000000000151b350 ***",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   (gp_free+0x15)[0x999f95]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   [0x99314c]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   [0x7ec774]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   (perfmon_segmentinfo_start+0x25)[0x7ec945]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   [0x7d6c16]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   [0x7d93b5]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   [0x7dfe2d]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   (PostmasterMain+0xc6a)[0x7e199a]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   (main+0x3b7)[0x719647]",,,,,,,,"SysLoggerMain","syslogger.c",638,
postgres:  5432, stats sender process   [0x4cbb2d]


Environment

Product Version: 5.28

Resolution

The issue is fixed in GPDB 5.28.1 and above.

30545 - Command Center The metrics collection code was updated to resolve a buffer overflow condition that could cause Greenplum Database to crash when gp_enable_query_metrics was set to "on."


To fix this, please upgrade the Greenplum database to v5.28.1 (or higher).

As a temporary workaround, we can also try to disable the "gp_enable_query_metrics" so the process "stats sender process" will be stopped:

# $ ps -ef | grep "stats sender" | grep 5432
gpadmin    4216   4206  0 09:33 ?        00:00:01 postgres:  5432, stats sender process

--- disable the gp_enable_query_metrics and restart the GPDB ---
# gpconfig -c gp_enable_query_metrics -v off
# gpstop -r

--- verify the process is now disabled ---
# ps -ef | grep "stats sender" | grep 5432 | wc -l
0

Note: Once the gp_enable_query_metrics is disabled, we will no longer able to see the active query in GPCC.