How to collect DDL and statistics information ussing the minirepro utility
search cancel

How to collect DDL and statistics information ussing the minirepro utility

book

Article ID: 295682

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

What is minirepro?

The minirepro utility is a smaller version of the gpsd (Greenplum Schema Dump) utility that collects the schema and statistics information for reproducing issues. gpsd retrieves the complete database metadata dump along with the statistics information for all relations.

Please note that the minirepro will collect pg_statistic entries for the tables involved in the query. These entries may have sampling values of user data. It is therefore recommended to sanitize the output file for these entries and remove them if user data is not supposed to be shared. 

Minirepro makes this easy by only extracting the information needed for the affected query and not for the complete database. It reads the query and gets DDL and statistics information for the relations involved. Usage examples are shown below:

[gpadmin@mdw]$ /usr/local/GP-4.3.7.2/bin/minirepro --version
minirepro 1.0
[gpadmin@mdw]$ /usr/local/GP-4.3.7.2/bin/minirepro --help
Usage: minirepro  [options]

Options:
  --version             show program's version number and exit
  -?, --help            Show this help message and exit
  -h HOST, --host=HOST  Specify a remote host
  -p PORT, --port=PORT  Specify a port other than 5432
  -U USER, --user=USER  Database user to connect with.This is mandatory as of now
  -q QUERY_FILE         file name that contains the query
  -f OUTPUT_FILE        minirepro output file name. Must be absolute path 
Example: minirepro database-name -q sql-file-name -f /tmp/output-file-name -U gpadmin

Note: This utility is made available for v4.3.7.0 and above. The query file should have a single valid formatted query.

 

Environment


Resolution

An example for the database db01 is shown below:

gpadmin@mdw]$ minirepro db01 -q query.sql -f /data/oufile_msq -U gpadmin 
Connecting to database: host=mdw, port=5432, user=gpadmin, db=db01 ...
Extracting metadata from query file query.sql ...
Invoking pg_dump to dump DDL ...
pg_dump -h mdw -p 5432 -U gpadmin -sxO db01 -t 'schema01.table01|table02|table03|table04' -f /tmp/20160419101152/xxxxx.dp.sql
pg_dump -h mdw -p 5432 -U gpadmin -sxO db01 -t 'schema02.tablename' -f /tmp/20160419101152/yyyy.dp.sql
Writing schema DDLs ...
Writing table & view DDLs ...
Writing table statistics ...
Writing column statistics ...
Attaching raw query text ...
--- MiniRepro completed! ---


Below is a list of possible errors that occur while running this utility:


Error 1

Error while running gp_toolkit.gp_dump_query_oids(text).
Please make sure the function is installed and the query file contains single valid query.

error 'ERROR:  Cannot parse query. Please make sure the input contains a single valid query. (gpoptutils.c:53)

Please make sure the input contains a single valid query (gpoptutils.c:53).


Solution
Make sure that the query file used with -q flag has no format/indent errors. Manual syntax and a semantic check is required. As mentioned in the error, there must be only one query in the file. 


Error 2

Error while running gp_toolkit.gp_dump_query_oids(text).
Please make sure the function is installed and the query file contains single valid query.

Solution
The tool needs to be upgraded to the most recent version. The functions need to be installed using the script file located at:

cp $GPHOME/share/postgresql/gp_toolkit.sql gp_toolkit_2.sql

Modify the gp_toolkit_2.sql using vi
 

1. Remove the following statement in the file. This should be in the beginning of the file..

Begin;

2. Save the file. 

Interactively run this file in the database using the following command:

prod_db=# \i gp_toolkit_2.sql ( or ) 
psql -f gp_toolkit_2.sql <dbname>

When running the gp_toolkit_2.sql, the following errors will be produced:

relation "gp_pgdatabase_invalid" already exists  

or 

function "gp_skew_idle_fraction" already exists with same argument types

These errors can be safely ignored.
 

Once done, check the minirepro again. 


Possible Error 3

raise ValueError(errmsg("Extra data", s, end, len(s))) 
ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 134 - 150)

Solution

This issue is produced when timing has been enabled on the system. Edit the .psqlrc file to remove the "\timing" option and save the file. Try minirepro again.

Possible Error 4
 

$ minirepro db01 -q query.sql -f ./outfile -U gpadmin
Connecting to database: host=mdw, port=5432, user=gpadmin, db=db01 ...
Traceback (most recent call last):
  File "/opt/gpdb/greenplum-db/./bin/minirepro", line 413, in <module>
    main()
  File "/opt/gpdb/greenplum-db/./bin/minirepro", line 329, in main
    conn = pgdb.connect(connectionString)
  File "/opt/gpdb/greenplum-db/lib/python/pygresql/pgdb.py", line 483, in connect
    dbtty, dbuser, dbpasswd)
pg.InternalError: fe_sendauth: no password supplied

Solution: Environment doesn't have gpadmin account. Check with customer to see what is the default admin role they use and use it with the -U option instead of "gpadmin"