How to check how much data has been loaded into the GPText index
search cancel

How to check how much data has been loaded into the GPText index

book

Article ID: 296244

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

GPText works with Greenplum Database and Apache SolrCloud to store and index big data for information retrieval (query) purposes. If new data is added into Greenplum tables, the data need be pushed into Solr so the new data can be indexed and be used for future searching.

Sometimes the user may want to check whether Solr has indexed all data in the tables or not, if that's the case, you may follow the guide below.

Resolution

1. Check how much data is in the table. 


Note: GPText will combine the duplicate index key to 1 record so we need to use 'distinct'

SELECT count(distinct id) from comment;
 count
-------
  2000

2. Now check how much data has been loaded into Solr.

# SELECT * from gptext.search_count('gpadmin.public.comment','*');
 count
-------
  2000
(1 row)

3. If more data is later inserted into the table, we need to push the same data into Solr as well.

# SELECT count(distinct id) from comment;
 count
-------
  5022    << the table now have 5022 record 

# SELECT * from gptext.search_count('gpadmin.public.comment','*');
 count
-------
  2000   << in solr still have original data

4. Push the data in Greenplum into the Solr instance. 


Note: The data already in the index will be deduplicated so GP will not push the same data into Solr again.

# SELECT * FROM gptext.index(TABLE(SELECT * FROM comment), 'gpadmin.public.comment');
# SELECT * FROM gptext.commit_index('gpadmin.public.comment');
# SELECT * from gptext.search_count('gpadmin.public.comment','*');
 count
-------
  5022   << now have same count as the table.

Note: We can also use the 'gptext-state' command to check the count of records in the index.

$ gptext-state stats
...
20190914:13:33:31:004471 gptext-state:gp-mdw-p:gpadmin-[INFO]:-   index name               num_docs   size in bytes
20190914:13:33:31:004471 gptext-state:gp-mdw-p:gpadmin-[INFO]:-   gpadmin.public.comment   5022       2000680
...