This article pertains to all versions of Pivotal Greenplum Database (GPDB).
This article covers some frequently asked Admin DBA related questions.
Check gpstart logfile
using the following command:
~gpadmin/gpAdminLogs/gpstart_yyyymmdd.log
Look in $MASTER_DATA_DIRECTORY/pg_log/startup.log
at the pg startup log file for more details. Also look at the last master log generated after the startup command was issued.
If the segments failed to startup the log, use the following command:
~gpadmin/gpAdminLogs/gpstart_yyyymmdd.log
This command lists the failed segments. Navigate to the host/segment directory to check the logs for more details on the failure.
gpstop -m
and gpstart -m
? How to find the uptime of the Greenplum database? The gpstart -m
command allows a user to start the master only and none of the data segments. It is used primarily by support to get system level information/configuration. An end user would not regularly or normally use it.
To find the uptime of the GPDB use the following command:
select now() - pg_postmaster_start_time() ;
There are no utilities available to remove mirrors from Greenplum. The cluster will need to reinitialize.
The gpcheckcat
tool is used to check catalog inconsistencies between the master and segments. It can be found in the $GPHOME/bin/lib
directory:
Usage: gpcheckcat [dbname] -? -B parallel: number of worker threads -g dir : generate SQL to rectify catalog corruption, put it in dir -h host : DB host name -p port : DB port number -P passwd : DB password -o : check OID consistency -U uname : DB User Name -v : verbose
Example: gpcheckcat gpdb > gpcheckcat_gpdb_logfile.log
gpdetective
and how do I run it in Greenplum? The gpdetective
utility collects information from a running Greenplum Database system and creates a bzip2-compressed
tar output file. This output file helps with the diagnosis of Greenplum Database errors or system failures. For more details check help.
gpdetective --help
To remove the currently configured standby master host from your Greenplum Database system, run the following command in the master only:
# gpinitstandby -r
Use this command only if a standby master is already configured. Use this method when attempting to resynchronize the data between the primary and backup master host. The Greenplum system catalog tables will not be updated.
# gpinitstandby -n (resynchronize)
Use the gprecoverseg
tool, which recognizes which segments need recovery and will initialize recovery.
Note: Without the "-F
" option, the change tracking log will be sent and applied to the mirror. With the "-F
" option, the entire data directory will be resynced.
The gpaddmirrors
utility configures mirror segment instances for an existing Greenplum Database system that was initially configured with primary segment instances only.
For more details check help:
# gpaddmirrors --help
From the database catalog, following query list configuration on content ID, figure out the primary and mirror for each content.
gpdb=# select * from gp_configuration order by content.
Note: Starting from GPDB 4.x, the gp_segment_configuration
table is used.
gpdb=# select * from gp_segment_configuration order by dbid;
gpstart
with option (-R)
stands for Admin mode or a restricted mode. Only superusers can connect to the database when database opened using this option.gpstart -m
. PGOPTIONS='-c gp_session_role=utility' psql
Create a directory where space is available and common in all hosts.
nic
card:gpcheckperf -f seg_host_file_nic-1 -r N -d /data/gpcheckperf > seg_host_file_nic_1.out gpcheckperf -f seg_host_file_nic-2 -r N -d /data/gpcheckperf > seg_host_file_nic_2.out
gpcheckperf -f seg_host_file_nic-1 -r ds -D -d /data/gpdb_p1 -d /data/gpdb_p2 -d /data/gpdb_m1 -d /data/gpdb_m2
In GP 4.0 version, check gpconfig
utility to change postgres.conf
parameters.
pg_hba.conf
? The pg_hba.conf
file of the master instance controls client access and authentication to the Greenplum system.
Use "createuser
" utility to create users. See "createuser --help
" for more details.
Another option is to use SQL commands in the psql prompt to create users.
CREATE USER or ROLE <ROLE_NAME> ....
Use gpssh-exkeys
:
gpssh-exkeys -h hostname1 -h hostname2 .. -h hostnameN
To check the version:
psql> select version();
or
postgres --gp-version
To check the gp version at install:
psql> select * from gp_version_at_initdb;
Connect the GPDB database using psql query catalog or use the show parameter command.
gpdb# select name,setting from pg_settings where name='GUC';
or
gpdb# show <GUC_NAME>;
pg_hba/logfile/master_data_directory
? The location of the Master directory is cd $MASTER_DATA_DIRECTORY,
pg_hba.conf
, postgres.conf
and other GPDB internal directories. The location of Master database logfiles
is cd $MASTER_DATA_DIRECTORY/pg_logs
.
grep for ERRORS
, FATAL
, SIGSEGV
in the pg_logs
directory.
VACUUM reclaims storage occupied by deleted tuples. In normal GPDB operation, tuples that are deleted or obsoleted by an update are not physically removed from their table. They remain present on disk until a VACUUM
is done. Therefore, it is necessary to do VACUUM
periodically, especially on the frequently-updated table.
Unless you need to return space to the OS so that other tables or other parts of the system can use that space, you should use VACUUM
instead of VACUUM FULL
.
VACUUM FULL
is only needed when a table that is mostly dead rows. That is, the vast majority of the contents of the table have been deleted. Even then, VACUUM FULL
should be used when disk space is urgently needed or the table will never again grow to its past size. Do not use it for table optimization or periodic maintenance as it is counterproductive.
Check the guide for more information.
ANALYZE
collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic
. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
It is a good idea to run ANALYZE
periodically, or just after making major changes in the contents of a table. Accurate statistics will help the query planner to choose the most appropriate query plan, thereby improving the speed of query processing. A common strategy is to run VACUUM
and ANALYZE
once a day during a low-usage time of day.
Resource queues are used to manager Greenplum database workload management. All user/queries can be prioritized using Resource queues. Refer Admin guide for more details.
gp_toolkit
? The gp_toolkit
is a database schema. It has many tables, views, and functions to better manage Greenplum Database when the DB is up. In 3.x earlier versions, it was referred to as gp_jetpack
.
For more detailed information please refer to the Greenplum Administrator's Guide.
Related articles: