Modifications to gpmon credentials or the related .pgpass file may lead to a "database unavailable" scenario.
Client connections to Pivotal Greenplum [GPDB] return with an error: "FATAL","53300","sorry, too many clients already"
While connections on the master instance will show the majority of the connections are in "startup", specifically from the Performance Monitor / Command Center User (gpmon):
# ps -ef | grep startup gpadmin 40845 35215 0 Apr08 ? 00:00:00 postgres: port 5432, gpmon gpperfmon [local] [local] startup
The issue may be caused by a reset or modification of the .pgpass file or a modification of the gpmon user password without reflecting the change in the .pgpass file.
The .pgpass file provides the password authentication required for the gpmon to connect to the database.
Logging captured in the $MASTER_DATA_DIRECTORY/pg_log will show that the gpmon user is unable to access the database and connections are setup to timeout.
2015-04-08 17:10:15.673010 EDT,,,p35215,th-1296873696,,,,0,,,seg-1,,,,,"LOG","00000","perfmon process (PID 35224) exited with exit code 1",,,,,,,0,,"postmaster.c",5876, 2015-04-08 17:10:15.682346 EDT,,,p35216,th-1296873696,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log: Performance Monitor - There was a problem accessing the gpperfmon database.",,,,,,,,"SysLoggerMain","syslogger.c",552, 2015-04-08 17:11:15.682579 EDT,"gpmon","gpperfmon",p12212,th-1296873696,"[local]",,2015-04-08 17:10:15 EDT,82728,con5441,,seg-1,,,x82728,sx1,"LOG","00000","Issuing cancel signal (SIGINT) to my self (pid = 12212) for statement timeout.",,,,,,,0,,"proc.c",1551,
However, as the gpmon user is scheduled to collect statistics on a more frequent interval than the default client timeout settings. These connections will begin to queue, waiting in startup, until they timeout. Eventually, they may consume all the available connections.
Follow the steps mentioned below:
1. Begin by stopping the Greenplum Performance Monitor:
# gpperfmon --stop
This should allow the connection in the startup to terminate without queuing new connections. However, in order to set the additional connections to free immediately, the remaining connections in the startup can be safely terminated by issuing a kill command:
# ps -C postgres -o pid=,command= | grep startup | awk {'print $1'} | xargs kill
2. Validate that the pg_hba.conf was updated to allow the connection to the database via password and the gpadmin .pgpass file contains an entry for the gpmon user.
# cat $MASTER_DATA_DIRECTORY/pg_hba.conf | grep gpmon local gpperfmon gpmon md5 host all gpmon 127.0.0.1/28 md5 # cat /home/gpadmin/.pgpass | grep gpmon *:5432:gpperfmon:gpmon:changeme
If an entry does not exist or is incorrect, modify or append the file with the updated or correct entry (the last entry for a user supersedes all others).
# echo *:5432:gpperfmon:gpmon:changeme >> /home/gpadmin/.pgpass
If any modifications have been made, reload the configuration and test the authentication.
# gpstop -u ... # psql -d gpperfmon -U gpmon -c 'select ctime from system_now limit 1;' Password for user gpmon: ctime --------------------- 2015-05-12 07:47:45 (1 row)
3. Once authentication succeeds, you can restart the Performance Monitor. Starting and restarting requires you to specify the gpmon superuser’s password:
$ PGPASSWORD='password1234' gpperfmon --start