Load datawarehouse job has failed in Prod in the initial run due to error. We resolved the issue and tried rerunning the Datawarehouse job. It has also failed with ERROR: must be a superuser to terminate superuser process error.Below is the error log.
Couldn't execute SQL: CALL DWH_FIN_PLAN_PER_FACTS_LOAD(P_LAST_LOAD_DATE => to_timestamp('1910/01/01 00:00:00', 'yyyy/MM/dd HH24:mi:ss')::timestamp, P_CURRENT_DIM_LOAD_DATE => to_timestamp('2023/07/15 00:45:50', 'yyyy/mm/dd HH24:MI:SS')::timestamp);
ERROR: must be a superuser to terminate superuser process
Where: SQL statement "SELECT pg_terminate_backend(v_pid)"
PL/pgSQL function dwh_disable_constraints_sp(character varying,character varying) line 60 at PERFORM
SQL statement "CALL DWH_DISABLE_CONSTRAINTS_SP('DWH_FIN_PLAN_PERIOD_FACTS','FK')"
PL/pgSQL function dwh_fin_plan_per_facts_load(timestamp without time zone,timestamp without time zone) line 199 at CALL
Release : 16.1.2
Root cause is the Load job running at a different time than usual.
To understand what exactly happens, the load job dwh_fin_plan_per_facts_load, performs a disable constraints operation via a stored procedure dwh_disable_constraints_sp.
-- Disable the constraints +
CALL DWH_DISABLE_CONSTRAINTS_SP('DWH_FIN_PLAN_PERIOD_FACTS','FK'); +
-- Recreate the indices +
CALL DWH_SAVEDROP_CREATE_INDEXES_SP('DWH_FIN_PLAN_PERIOD_FACTS','SAVE_DROP');
And this disable constraints SP, performs a terminate of sessions that are holding LOCKs on tables. The scheduled job usually starts at 12:00 AM AEST. However, the other day that was manually triggered at 12:45 AM AEST. This manual run of job coincided with the regular backups schedule.
IF v_proceed = 0 THEN +
PERFORM pg_terminate_backend(v_pid); +
v_proceed := 1; +
v_sql_text := 'TERMINATED PROCESS ID '||v_pid::text||' as it was locking '||fk.table_name||' .'; +
INSERT INTO dwh_cfg_general_message VALUES (fk.constraint_name::character varying(32),v_sql_text, localtimestamp); +
END IF; +
-- If there are no locks, then the procedure will function as it should by deleting the constraints +
-- If there are locks, then the constraint doesn't get disabled +
The manually triggered load job started at 12:45 AM AEST. The backups trigger at 01:00 AM AEST and usually finish in about 15-20 minutes. When the load job reached the disable constraints step, the backup was in progress. Backup which runs with super user role(postgres), could not be killed by the pg_terminate_backend(v_pid) by the application user(xxxxxx), which cascaded to the failure of the load job. PFB the backup time for the database on the issue day(times in UTC).
backuplogid | begintime | endtime | database | dbhost | status | message | logfile | backupsize
-------------+---------------------+---------------------+-------------+----------------------+---------+-------------------------------+---------+------------
| 2023-07-14 15:07:58 | 2023-07-14 15:12:57 | | | SUCCESS | Parallel backup got completed | |