How to Manage Log Files

Products

VMware Tanzu Greenplum

Issue/Introduction

This article refers to How to manage log files.

Environment

Resolution

Management of log files on GPDB

The database log files are stored in the pg_log directory of each data directory.

On a master server (such as mdw), the data directory is normally /data/master/gpseg-1 (the value of the environmental variable $MASTER_DATA_DIRECTORY).

On a segment server, the data directory is normally ///. For example, /data1/primary/gpseg0 or /data2/mirror/gpseg1, etc.

Common settings to maintain log files

log_rotation_age defines when a new log file will be automatically created, while a database instance is running. By default, a new log file is created every day, this value can be changed using gpconfig.

[gpadmin@mdw ~]$ gpconfig -s log_rotation_age
Values on all segments are consistent
GUC          : log_rotation_age
Master  value: 1d
Segment value: 1d

The name of the log file is defined by the parameter log_filename. The default value for this parameter is displayed below and it implies a new log file is created each time the database instance starts:

[gpadmin@mdw ~]$ gpconfig -s log_filename
 Values on all segments are consistent
 GUC          : log_filename
 Master  value: gpdb-%Y-%m-%d_%H%M%S.csv
 Segment value: gpdb-%Y-%m-%d_%H%M%S.csv

This value cannot be changed using gpconfig, but the parameter must be set in the configuration file manually on each server. For example, to have only one file per day (even if the database is restarted during the day):

echo log_filename=\'gpdb-%Y-%m-%d_000000.csv\' >> $MASTER_DATA_DIRECTORY/postgresql.conf

This will work only if new entries are appended to log files, such as when the parameter log_truncate_on_rotation is kept at its default (off).

Other related parameters are:

log_rotation_size
log_statement
log_statement_stats
log_timezone

Management of Log files on a Greenplum Data Computing Appliance (DCA)

On a DCA, the settings in /opt/dca/etc/dca_log_cleanup.conf on each server define the files that are to be monitored, how many of those are kept, and how large their directories can grow.

The dca_log_cleanup system service reads this file only once when it starts, immediately after it deletes files according to the policies defined. By default, it will then monitor and delete files every hour. This is defined by the variable pollInterval at the beginning of the file.

Multiple file sets can be defined. Each set has four variables associated

filesetRegex: A regular expression used as a "glob" to match the set of files.
maxFiles: The maximum number of items (files or directories) matched by the glob allowed. Any over this limit will be deleted.
maxSize: The maximum size of all the items allowed. This must be specified in megabytes or gigabytes using an "M" or "G". For example, 100 megabytes is "100M" and 10 gigabytes is "10G".
continuousCheck: Specify "False" if only a single check after startup is needed. This is optional and defaults to "True" which enables checking of the fileset on every polling cycle.

For each set, the older files (based on modification time) will be deleted once maxFiles or maxSize has been reached. Once the file has been edited, in order to make the settings effective, restart the service. For example:

[root@mdw ~]# service dca_log_cleanup restart
Stopping dca_log_cleanup.py:                       [  OK  ]
Starting dca_log_cleanup.py:                       [  OK  ]

Verify there is only one process running this service on each host. For example:

[root@mdw ~]$ gpssh -f ~/hostfile 'pgrep -fl dca_log_cleanup | grep -v grep | wc -l'
[ mdw] 1
[smdw] 1
[sdw1] 1
[sdw2] 1
[sdw3] 1
[sdw4] 1

If the same policy is to be applied to all servers in the cluster, maintain the file in sync on all the servers (such as via SCP). If you are still unable to maintain log files according to your requirements, contact Customer Service.