"-gpstart failed. (Reason='FATAL: the database system is starting up..." error after gpstart hangs and fails because of large gpperfmon directory in Tanzu Greenplum
search cancel

"-gpstart failed. (Reason='FATAL: the database system is starting up..." error after gpstart hangs and fails because of large gpperfmon directory in Tanzu Greenplum

book

Article ID: 296705

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

gpstart can appear to hang or take a long time at this stage of the startup:
[gpadmin@test-m ~]$ gpstart -a
20211002:19:20:12:007107 gpstart:test-m:gpadmin-[INFO]:-Starting gpstart with args: -a
20211002:19:20:12:007107 gpstart:test-m:gpadmin-[INFO]:-Gathering information and validating the environment...
20211002:19:20:12:007107 gpstart:test-m:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.17.6 build commit:89abca9000fe3f32862d94276d54b5947fd8b80a'
20211002:19:20:12:007107 gpstart:test-m:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20211002:19:20:12:007107 gpstart:test-m:gpadmin-[INFO]:-Starting Master instance in admin mode

If you let it go on or if this persists, gpstart could fail with this message:
20211002:19:40:12:005462 gpstart:test-m:gpadmin-[CRITICAL]:-gpstart failed. (Reason='FATAL:  the database system is starting up
DETAIL:  last replayed record at 0/0

The master log will show the same messages repeating.

Environment

Product Version: 6.16

Resolution

The problem is usually caused by having a large $MASTER_DATA_DIRECTORY/gpperfmon/data. Locate the startup process using this command:
 "ps -ef | grep postgres"

Performing strace -p <startup_process> shows it is doing lstat on gpperfmon/data/<query_monitor_file>.txt.

If there's a lot of files in this directory, gpstart will go through each one. This can happen when you are on GPCC 6.x but still have legacy gpperfmon stats collector enabled.

To work around this issue, follow these steps:

1. Temporarily move the $MASTER_DATA_DIRECTORY/gpperfmon/data directory to another location

2. Start the database.

3. Run this command:
gpconfig -c gp_enable_gpperfmon -v off

4. Restart the database.