One scenario of ERROR: cache lookup failed for relation xxx (ruleutils.c:10322) and the explanation
search cancel

One scenario of ERROR: cache lookup failed for relation xxx (ruleutils.c:10322) and the explanation

book

Article ID: 414962

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Suite

Issue/Introduction

Querying a catalog table pg_indexes will return ERROR: cache lookup failed for relation <index oid> (ruleutils.c:10322)

Preparation

Create a test table, insert data into it, and analyze it

drop table if exists test1 ;
create table test1 (calendar_date date);
insert into test1(calendar_date) values('20250101');
analyze test1 ;

Reproduce the Issue

1. Create index on the table

drop index if exists index_test1;
create index index_test1 on test1(calendar_date);

2. Open session 1 begin a transaction to re-create the index:

begin;
drop index if exists index_test1;
create index index_test1 on test1(calendar_date);

3. Open session 2 and query pg_indexes:

select * from pg_indexes where tablename='test1'; <----------- this query will hung

4. Go to session 1 and commit the transaction;

5, Now the session 2 will throw out ERROR: cache lookup failed for relation <index oid> (ruleutils.c:10322)

 

Environment

GPDB 6

Cause

This is a expected behavior which was inherited from upstream postgres, let's break down the the cause step by step

Step 1 - session 1 - uncommitted transaction will lock both the table and index

gpadmin=# begin;
BEGIN
gpadmin=# drop index if exists index_test1;
DROP INDEX
gpadmin=# create index index_test1 on test1(calendar_date);
CREATE INDEX
gpadmin=# select pid, sess_id from pg_stat_activity where pid = pg_backend_pid();
   pid   | sess_id
---------+---------
 3559635 |     361
(1 row)
gpadmin=# select l.locktype, l.relation, c.relname,c.relkind, l.mode, l.granted, l.mppsessionid from pg_locks l join pg_class c on l.relation = c.oid where mppsessionid=361 and relation in (select oid from pg_class where relname in ('index_test1', 'test1'));
locktype | relation |  relname   | relkind |        mode         | granted | mppsessionid
----------+----------+-------------+---------+---------------------+---------+--------------
relation |  1444935 | test1   | r       | ShareLock           | t       |          361
relation |  1444935 | test1   | r       | AccessExclusiveLock | t       |          361
relation |  1444935 | test1   | r       | ShareLock           | t       |          361
relation |  1444935 | test1   | r       | AccessExclusiveLock | t       |          361
relation |  1444935 | test1   | r       | ShareLock           | t       |          361
relation |  1444935 | test1   | r       | AccessExclusiveLock | t       |          361
 relation |  1444938 | index_test1 | i       | AccessExclusiveLock | t       |          361
 relation |  1444938 | index_test1 | i       | AccessExclusiveLock | t       |          361
 relation |  1444938 | index_test1 | i       | AccessExclusiveLock | t       |          361
(9 rows)

Step 2 - Session 2 - pg_indexes query will wait until AccessExclusiveLock released

- Now the following query will be stuck on waiting to gain a AccessShareLock on relation 'test1', however, such lock will not be granted unless the AccessExclusiveLock in session 1 get released.

gpadmin=# select * from pg_indexes where tablename='test1'; <----------- this query will hung

gpadmin=# select l.locktype, l.relation, c.relname,c.relkind, l.mode, l.granted, l.mppsessionid from pg_locks l join pg_class c on l.relation = c.oid where relation in (select oid from pg_class where relname in ('index_test1', 'test1')) and granted = 'f';
 locktype | relation | relname  | relkind |      mode       | granted | mppsessionid
----------+----------+----------+---------+-----------------+---------+--------------
relation |  1444935 | test1 | r       | AccessShareLock | f       |          472
(1 row)

- Note that at this moment, the index oid was staying as the old one 1444938 because session 1 was not getting committed

gpadmin=# select oid,relname from pg_class where relname in ('index_test1', 'test1');
   oid   |  relname
---------+------------
 1444935 | test1
 1444938 | index_test1
(2 rows)

Step 3 - commit session 1, session 2 was granted with required lock, only to find the old index oid is gone, thus returned with error.

- End session 1

session 1
gpadmin=# end ;
COMMIT

- Session 2 now can finish but will return with an error because the index oid has been recreated so the old index oid will no longer be found.

gpadmin=# select * from pg_indexes where tablename='test1';
ERROR:  cache lookup failed for relation 1444938 (ruleutils.c:10350)

- Checking again now to identify the index oid change

gpadmin=# select oid,relname from pg_class where relname in ('index_test1', 'test1');
   oid   |  relname
---------+------------
 1444939 | index_test1
 1444935 | test1
(2 rows)

Resolution

This is a expected by-design behavior, re-run the query it will work fine,