ERROR: missing data for column "loguser" when querying gp_toolkit.gp_log_database
search cancel

ERROR: missing data for column "loguser" when querying gp_toolkit.gp_log_database

book

Article ID: 296268

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

You receive the following error when querying the table gp_toolkit.gp_log_database:
gpadmin=# select * from gp_toolkit.gp_log_database limit 100;
ERROR: missing data for column "loguser" (seg35 slice1 sdw5:1028 pid=753186) DETAIL: External table __gp_log_segment_ext, line 12417992 of execute:cat $GP_SEG_DATADIR/pg_log/*.csv: "total 1091860" 


Environment

Product Version: 4.3

Resolution

This error indicates there is something unusual with the format or contents of some of the .csv files under the pg_log directory of the segment that reported the error.

In this particular case, there was a file named  "s gpdb-2014-07-24_000000.csv" and its contents were not pg_log files but it had been named as such. 

The resolution was to rename the file name to a non csv extension.

Other useful troubleshooting steps:

The table gp_toolkit.gp_log_database is referencing the external table gp_toolkit.__gp_log_segment_ext, which reads all .csv files under $GP_SEG_DATADIR/pg_log/ as a single file, so the line number reported on the error is cumulative of all the .csv files together as one single file.

In order to determine what file is the conflicting one, you can run the following command under the pg_log directory of the segment reporting the error to see a list of the line numbers:

[gpadmin@mdw_lab2 pg_log]$ wc -l *.csv
    137 gpdb-2019-10-20_203215.csv
    168 gpdb-2019-10-20_203258.csv
  11259 gpdb-2019-10-20_203306.csv
   1324 gpdb-2019-10-21_000000.csv
    405 gpdb-2019-10-22_000000.csv
  13293 total

Then work out what file contains the line reported on the error. For example for the above output, if the line reported is 2500, we know it will be in file gpdb-2019-10-20_203306.csv. If we rename it to .txt or other extension, the query should now succeed.

In order to determine what is wrong with the file, we can create the following test external table based on gp_toolkit.__gp_log_segment_ext but pointing to the specific segment and file:
 

CREATE EXTERNAL WEB TABLE test_pivotal (
    logtime timestamp with time zone,
    loguser text,
    logdatabase text,
    logpid text,
    logthread text,
    loghost text,
    logport text,
    logsessiontime timestamp with time zone,
    logtransaction integer,
    logsession text,
    logcmdcount text,
    logsegment text,
    logslice text,
    logdistxact text,
    loglocalxact text,
    logsubxact text,
    logseverity text,
    logstate text,
    logmessage text,
    logdetail text,
    loghint text,
    logquery text,
    logquerypos integer,
    logcontext text,
    logdebug text,
    logcursorpos integer,
    logfunction text,
    logfile text,
    logline integer,
    logstack text
) EXECUTE E'cat $GP_SEG_DATADIR/pg_log/gpdb-*.txt' ON SEGMENT 35 
FORMAT 'csv' (delimiter E',' null E'' escape E'"' quote E'"')
ENCODING 'UTF8';

 

Note: We have specified segment 35 and gpdb-*.txt as the file name (we would have previously renamed the conflictive log file into .txt)

When querying the table, it will now point to the exact line number having the issue, so we can open the file and examine it.