When attempting an update statement in a Greenplum Database (GPDB) cluster, you receive the following error:
ERROR: multiple updates to a row by the same query is not allowed (seg5 10.XX.XX.XX:6007 pid=325632)
This error has two common causes:
gpadmin=# create table tableR (i1 int, i2 int) distributed by (i1); CREATE TABLE *Note - GPDB will default to distribution by column "i1" in this case, but it is specified manually here as an example. gpadmin=# create table tableS (i1 int, i2 int); NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i1' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CREATE TABLE gpadmin=# insert into tableR values (1, 1); INSERT 0 1 gpadmin=# insert into TableS values(1, 2); INSERT 0 1 gpadmin=# insert into TableS values(1, 7); INSERT 0 1 gpadmin=# select * from tableS; i1 | i2 ----+---- 1 | 2 1 | 7 (2 rows) gpadmin=# select * from tableR; i1 | i2 ----+---- 1 | 1 (1 row) gpadmin=# update tableR set i1 = tableS.i2 from tableS where tableR.i2=tableS.i1; ERROR: multiple updates to a row by the same query is not allowed (seg1 192.168.99.101:30001 pid=8173)
The above examples fails to perform the update since the following condition has non-deterministic output.
update tableR set i1 = tableS.i2 from tableS where tableR.i2=tableS.i1;
The condition "from tableS where tableR.i2=tableS.i1" matches both rows in tableS. This update statement issues multiple requests to update the row in tableR in this situation. The row in tableR could be set to (2,1), or (7,1).
To work around this issue, rewrite any queries encountering this issue so that the query does not have non-deterministic update conditions. This is the best option for ensuring queries are issuing updates efficiently.
GPORCA optimizer can be disabled on a per-session basis. This option is not preferred because it only works if the column being updated is not a distribution column for the table. See Changed Behavior with the GPORCA for more information on non-deterministic updates.