"Multiple updates to a row by the same query is not allowed" error when running a non-deterministic update on a table with the Query Optimizer in Tanzu Greenplum
search cancel

"Multiple updates to a row by the same query is not allowed" error when running a non-deterministic update on a table with the Query Optimizer in Tanzu Greenplum

book

Article ID: 296220

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

When attempting an update statement in a Greenplum Database (GPDB) cluster, you receive the following error:

ERROR: multiple updates to a row by the same query is not allowed (seg5 10.XX.XX.XX:6007 pid=325632)
 

This error has two common causes:

  • The update conditions match multiple rows from the data set. The conditions are non-deterministic, or the data itself is duplicated.
  • A split update is used by legacy planner or GPORCA/Optimizer:
    • With Legacy Planner, a split update is used only when updating distribution columns, where the column is specified in the table as "distributed by <column name>".
    • When using GPORCA/Optimizer, split updates are used for updates on all columns.  

The following is an example of this issue occurring. This example causes both GPORCA and Legacy Planner to throw an error.
gpadmin=# create table tableR (i1 int, i2 int) distributed by (i1);
CREATE TABLE
*Note - GPDB will default to distribution by column "i1" in this case, but it is specified manually here as an example.

gpadmin=# create table tableS (i1 int, i2 int);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i1' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE

gpadmin=# insert into tableR values (1, 1);
INSERT 0 1
gpadmin=# insert into TableS values(1, 2);
INSERT 0 1
gpadmin=# insert into TableS values(1, 7);
INSERT 0 1

gpadmin=# select * from tableS;
 i1 | i2 
----+----
  1 |  2
  1 |  7
(2 rows)

gpadmin=# select * from tableR;
 i1 | i2 
----+----
  1 |  1
(1 row)


gpadmin=# update tableR set i1 = tableS.i2 from tableS where tableR.i2=tableS.i1;
ERROR:  multiple updates to a row by the same query is not allowed  (seg1 192.168.99.101:30001 pid=8173)

 



Resolution

The above examples fails to perform the update since the following condition has non-deterministic output. 

update tableR set i1 = tableS.i2 from tableS where tableR.i2=tableS.i1;


The condition "from tableS where tableR.i2=tableS.i1" matches both rows in tableS. This update statement issues multiple requests to update the row in tableR in this situation. The row in tableR could be set to (2,1), or (7,1). 


To work around this issue, rewrite any queries encountering this issue so that the query does not have non-deterministic update conditions. This is the best option for ensuring queries are issuing updates efficiently.

GPORCA optimizer can be disabled on a per-session basis. This option is not preferred because it only works if the column being updated is not a distribution column for the table. See Changed Behavior with the GPORCA for more information on non-deterministic updates.