How to use the external table to exam the logs
search cancel

How to use the external table to exam the logs

book

Article ID: 296855

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

The GPDB logs are distributed to every segment's pg_log folder and if we want to monitor the segment logs with some specific ERROR message, we have to either use the gpssh or write some PYTHON scripts. 

Except the above two methods, there is actually a build-in function called web external table can help us archive the same task.

Environment

Product Version: 6.21

Resolution

Here is a simple example: 
[gpadmin@support-gpddb6-mdw][~]$ gpssh -f hostfile
=> echo 'echo YES' > myprogram.sh
=> mkdir -p /home/gpadmin/programs
=> mv myprogram.sh /home/gpadmin/programs/
=> chmod +x /home/gpadmin/programs/myprogram.sh

=# CREATE EXTERNAL WEB TABLE output (output text)
    EXECUTE 'PATH=/home/gpadmin/programs; export PATH; myprogram.sh' 
    FORMAT 'TEXT';

//Since we have 4 segments, then the output: 

gpadmin=# select * from output;
 output
--------
 YES
 YES
 YES
 YES
(4 rows)

Now, let's add some more items with output, the below part only show two columns and two results since it is excuted on segment host only ( I have only two hosts in my lab):
[gpadmin@support-gpddb6-mdw][~]$ gpssh -f hostfile
=> echo 'echo $RANDOM "|" `echo $RANDOM | base64`' > get_log_data.sh

CREATE EXTERNAL WEB TABLE log_output 
    (linenum int, message text) 
    EXECUTE '/home/gpadmin/get_log_data.sh' ON HOST 
    FORMAT 'TEXT' (DELIMITER '|');

gpadmin=# select * from log_output ;
 linenum |  message
---------+-----------
   25214 |  NTg2Ngo=
     782 |  OTcwMAo=
(2 rows)


Also, the web external table supports GPDB bash variables: 
$GP_CID	Command count of the transaction executing the external table statement.
$GP_DATABASE	The database in which the external table definition resides.
$GP_DATE	The date on which the external table command ran.
$GP_MASTER_HOST	The host name of the Greenplum master host from which the external table statement was dispatched.
$GP_MASTER_PORT	The port number of the Greenplum master instance from which the external table statement was dispatched.
$GP_QUERY_STRING	The SQL command (DML or SQL query) executed by Greenplum Database.
$GP_SEG_DATADIR	The location of the data directory of the segment instance executing the external table command.
$GP_SEG_PG_CONF	The location of the postgresql.conf file of the segment instance executing the external table command.
$GP_SEG_PORT	The port number of the segment instance executing the external table command.
$GP_SEGMENT_COUNT	The total number of primary segment instances in the Greenplum Database system.
$GP_SEGMENT_ID	The ID number of the segment instance executing the external table command (same as dbid in gp_segment_configuration).
$GP_SESSION_ID	The database session identifier number associated with the external table statement.
$GP_SN	Serial number of the external table scan node in the query plan of the external table statement.
$GP_TIME	The time the external table command was executed.
$GP_USER	The database user executing the external table statement.
$GP_XID	The transaction ID of the external table statement.

Example: 
drop external table if exists show_segment_dir;
create external web table show_segment_dir(
    filenode text
) execute'echo $GP_SEG_DATADIR'
    FORMAT 'TEXT' (DELIMITER '|');

gpadmin=# select * from show_segment_dir ;
                   filenode
-----------------------------------------------
 /data2/segment/segment_6.20.3/primary2/gpseg3
 /data2/segment/segment_6.20.3/primary1/gpseg2
 /data2/segment/segment_6.20.3/primary2/gpseg1
 /data2/segment/segment_6.20.3/primary1/gpseg0

Now, let's program something more practical. Let's say we want to monitor the segment log with below error message:
157776:2022-07-09 04:31:18.381250 HKT,,,p142648,th-156596352,,,,0,,,seg-1,,,,,"ERROR","58P01","could not open relation 1663/1404026/57456646: No such file or directory",,,,,"writing block 0 of relation 1663/1404026/57456646",,0,,"md.c",1478,"Stack trace:
Let's assume it would happen to random segment and there are two things we need to check for this issue as well: 

1) If the missing file exists on the filesystem 
2) if the missing file exists in the pg_class

Then, we can write some scripts like below: 
drop external table if exists no_such_file_ext;
 
create external web table no_such_file_ext(
    filenode int,
    info text
) execute 'dboid=`PGOPTIONS=''-c gp_session_role=utility'' psql -d postgres -p $GP_SEG_PORT -qtAXc "select oid from pg_database where datname = ''$GP_DATABASE''"`
cat pg_log/gpdb-`date +%Y-%m-%d`_*.csv|grep "No such file"|grep -o "1663/$dboid/[0-9]*"|awk -F "/" "{print \$NF}"|sort|uniq|while read relfilenode;do
if [ ! -f "base/$dboid/$relfilenode" ];then
echo $relfilenode,`hostname`:$GP_SEG_DATADIR/base/$dboid/$relfilenode
fi
done 
' ON ALL FORMAT 'TEXT' (delimiter ',') segment reject limit 10;

In the above script, it already used  if [! -f ]  to see if the file exists or not; now we need to use a left join to cross check the pg_class:
select * from (select  * from no_such_file_ext e left join gp_dist_random('pg_class') c on e.gp_segment_id = c.gp_segment_id and e.filenode = c.relfilenode) x where relname is null;
 filenode |                                          info                                           | relname | relnamespace | reltype | relowner | relam | relfilenode | reltables
pace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs
 | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
----------+-----------------------------------------------------------------------------------------+---------+--------------+---------+----------+-------+-------------+----------
-----+----------+-----------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+--------
-+------------+------------+-------------+----------------+--------------+--------+------------
 57456330 | support-gpddb6-sdw01:/data2/segment/segment_5.28.5/primary1/gpseg0/base/116416/57456330 |         |              |         |          |       |             |
     |          |           |               |               |             |             |         |            |          |           |             |          |          |
 |            |            |             |                |              |        |

So, in my lab result above, obviously the 57456330 on sdw01's seg0 is missing. 

In conclusion, we can use above method to writing the similar monitor scripts which can utilize the MPP architecture and get executed in multiple segments parallelly. It is fast, easy and we don't have use the 3rd party tools to think how execute the script parallelly.