postgres database is filling up root file system on Pivotal Command Center host

Products

Services Suite

Issue/Introduction

Symptoms:

The disk space usage of /var/lib/pgsql is 10GB and growing on the Pivotal Command Center (PCC) node.

/var/lib/psql                    - 11 G

Environment

Cause

PCC will continuously collect Hadoop and system metrics from the installed clusters. Depending on the size of the cluster, regular maintenance may be necessary to keep the database size under control.

Resolution

1. Connect to the database and identify what tables are using the most space:

$ psql -U postgres gphdmgr -p 10432
gphdmgr=# SELECT tablename ,pg_size_pretty(pg_total_relation_size(tablename::text)) AS size FROM pg_tables WHERE schemaname = 'public' order by 1 DESC;
            tablename             |    size
----------------------------------+------------
 data_node_report                 | 10 GB
 system_history_agg_all_hosts_4h  | 240 kB
 system_history_agg_all_hosts_30m | 1560 kB
 system_history_agg_all_hosts_1m  | 2656 kB
 system_history_agg_all_hosts_1h  | 816 kB
 system_history_agg_all_hosts_1d  | 88 kB
 system_history                   | 14 MB
 stack_properties                 | 32 kB
 stack                            | 32 kB

2. In this case, data_node_report table is 10GB and is used for collecting HDFS usage statistics. This table is safe to truncate and will not impact ICM or PCC operations. This will simply purge the historical statistics from the database.

The following list of tables are safe to execute steps 3-6. Any table not in this list should never be modified, deleted, or changed without consulting pivotal support.

app	Yarn Application details
app_attempt	Yarn Container details for yarn application
data_node_report	HDFS data usage statistics
hbase_metrics	Hbase Operation metrics
job	Mapreduce History stats for completed applications
job_attempt	Mapreduce Containers stats
job_tracker_metrics	Resource ManageråÊstatistics
namenode_metrics	Namenode RPC and load statistics
queue_metrics	Yarn Scheduler statistics
system_history	CPU, Mem, IO stats for each host

NOTE: Please take caution when proceeding through steps 3 and 6. Truncating the wrong table could result in disabling icm_client operations. In addition, truncating the tables in the above list will result in the lost of historical cluster statistics.

3. First backup the database to a safe location incase any mistakes are made during this maintenance activity.

# service commander stop

Stopping PHD Manager Services...
nodeagent is not running
Stopping Jetty                                             [  OK  ]
Stopping httpd:                                            [  OK  ]
Stopping Pivotal Command Center HTTPS                      [  OK  ]
Stopping Pivotal Command Center Background Worker          [  OK  ]

# service postgresql stop
Stopping postgresql service:                               [  OK  ]

# tar -cf /backup/pgsql.backup.12-01-2014.tar /var/lib/pgsql

4. After the backup is completed start the database and truncate the large tables.

# service postgresql start
Starting postgresql service:                               [  OK  ]

# psql -U postgres -p 10432 gphdmgr

gphdmgr=# TRUNCATE TABLE data_node_report;
TRUNCATE TABLE

5. Vacuum the database. Please DO NOT CANCEL this operation once triggered.

gphdmgr=# VACUUM FULL;
VACUUM

6. Remember to REINDEX the database to clean up related index tables and any toast tables.

gphdmgr=# REINDEX DATABASE gphdmgr;
NOTICE:  table "pg_class" was reindexed
NOTICE:  table "pg_type" was reindexed
.
.
REINDEX
gphdmgr=#

7. Confirm database usage is decreased.

gphdmgr=# SELECT 'data_node_report' ,pg_size_pretty(pg_total_relation_size('data_node_report')) AS size;
     ?column?     | size
------------------+-------
 data_node_report | 16 kB

8. Start command center services.

# service commander start

Starting PHD Manager Services...
Starting httpd:                                            [  OK  ]
Preparing nodeagent:                                       [  OK  ]
Starting  nodeagent:                                       [  OK  ]

Starting Jetty                                             [  OK  ]
Waiting for Jetty to mount applications                    [  OK  ]
Starting Pivotal Command Center HTTPS                      [  OK  ]
Starting Pivotal Command Center Background Worker          [  OK  ]

8. During this procedure we disabled PostgreSql and that might have caused the nmon process to shutdown on the pivotal command center node. To ensure nmon is still running, execute the following steps from the pivotal command center node.

# service nmon start
# massh  <hostfile> verbose "service nmon restart"
# massh  <hostfile> verbose "service nmon status"