Note: This procedure assumes you are looking for this information from the last 90 days.
This time frame can be adjusted by modifying the number in queries where you see,
current_date -90
.
These queries are run against the Postgres log files in
$MASTER_DATA_DIRECTORY/pg_log
. If you archive and remove Postgres log files on a regular basis, this may limit how far back you can actually go using the logic presented in this article.
Additionally, the logic presented here requires log connections to be turned on (
log_connections=on
) in the
postgresql.conf
file:
In this case, the table audit_table_full contains the raw data regarding table access.
Note: If you want the audit table to retain historical data, comment out the "
drop table
" statement. If this is done, it may cause an error message indicating the table already exists. This behavior is expected, and can be ignored:
drop table if exists audit_table_full;
create table audit_full
( loguser varchar(16), -- username
logdatabase varchar(128), -- database name
logsession varchar(20), -- session id
tname varchar(128), -- table name
updated timestamp) -- when touched)
distributed randomly;
create temp table auditor
(tname varchar(128))
distributed randomly;
insert into auditor
select schemaname || '.' || tablename from pg_catalog.pg_tables
where schemaname not in ('information_schema','gp_toolkit','pg_catalog') and
schemaname not like 'pg_temp%' -- exclude temp tables
union
select ' ' || tablename from pg_catalog.pg_table
where schemaname = 'public' ;
The Temp table holds a record for each database connection record found.
create temp table audit_connect
(logtime timestamp, -- timestamp on pg_log record
loguser varchar(64), -- userid
logdatabase varchar(64), -- database name
logsession varchar(20), -- session id
tname varchar(128)) -- table name
distributed randomly;
Read from Postgres log file(s) and insert the appropriate records into
audit_connect
table 1 for each connection made by any user connecting to this database.
insert into audit_connect
select min(logtime), loguser, logdatabase, logsession, tname
from gp_toolkit.__gp_log_master_ext, auditor
where logmessage like 'connection authorized:%'
and logtime > current_date - 90 -- check for activity in last 90 days
and logsession > ' '
and logdatabase = current_database()
group by loguser, logdatabase, logsession, tname;
Get all records referencing any table in the database from the Postgres log file. The session must match session id in
audit_connect
and the table referenced must be in the auditor table. Since you know the log time of the connection record, use this in the "
where
" clause to limit how far back Postgres log file is searched.
create temp table audit_work
( loguser varchar(64), -- userid
logdatabase varchar(64), -- database name
logsession varchar(20), -- session id
tname varchar(128), -- table name
updated timestamp) -- timestamp from postgres log file
distributed randomly;
Note: The cartesian product on the insert statement below is intended. Any type of activity (select, insert, update, or delete) is considered 'valid'. Depending on your needs, you may want to exclude some of these actions.
insert into audit_work
select b.loguser,
b.logdatabase,
b.logsession,
b.tname,
a.logtime
from gp_toolkit.__gp_log_master_ext a,
audit_connect b
where a.logtime > b.logtime and
a.loguser = b.loguser and
a.logdatabase = b.logdatabase and
a.logsession = b.logsession and
a.logmessage like 'statement:%' and
( a.logmessage like '% ' || ltrim(b.tname) || ' %' or
a.logmessage like '% ' || ltrim(b.tname) || ';%') and
( a.logmessage like '%insert%' or
a.logmessage like '%update%' or
a.logmessage like '%select%' or
a.logmessage like '%delete)
;
You need to address the public tables. A user may have entered a table name or qualified with a public schema. The following command will find all tables without a public schema and updated them to reflect the public schema.
update audit_work
set tname = 'public.' || substr(tname,2,127)
where substr(tname,1,1) = ' ';
If you are recreating the
audit_table_full
each execution time, the next SQL statement will not remove any records. However if you are keeping the
audit_table_full
for history, then it is important to remove potential duplicate records.
delete from audit_work a
where exists
(select 1 from audit_full b
where a.loguser = b.loguser and
a.logdatabase = b.logdatabase and
a.logsession = b.logsession and
a.tname = b.tname and
a.updated = b.updated);
Now insert the data into the catalogued
audit_full
table.
insert into audit_full select * from audit_work;
Create a table with all tables not referenced in last 90 days.
Note: Only pick up tables with a schema name, do not pick up tables with a blank schema name, as these have been updated in
audit_full
to have
<public>.<tablename>
.
drop table if exists audit_not_referenced;
create table audit_not_referenced
as select a.tname from auditor a
where a.tname not in
(select distinct(tname) from audit_full)
and substr(a.tname,1,1) != ' ';
The table,
audit_not_referenced
, has a list of all tables currently defined in the database but were not found in the
audit_ful
l table. Below are examples of other queries that can be run against the
audit_full
table.
If you are not cleaning up
audit_full
but still want a list of tables not referenced in last 90 days, run the following query:
select schemaname || '.' || tablename as unreferenced_table
from pg_catalog.pg_tables a
where schemaname not in ('information_schema','gp_toolkit','pg_catalog')
and schemaname not like 'pg_temp%'
and not exists
(select 1 from audit_full b
where tname = a.schemaname || '.' || a.tablename
and updated > current_date - 90); -- this line defines states how far to go back
Run the below command to find out when each table was last referenced:
select tname as table_name,
max(updated) as last_referenced
from audit_full
group by 1 order by 1;
Run the following query to find out which tables a particular user referenced, along with a count of how many times they referenced the table. The example below uses user '
testuser
'.
select loguser as userid,
tname as tablename ,
max(updated) as last_update,
count(*) as count
from audit_full b
where loguser = 'testuser'
group by 1,2 order by 1,2;