Greenplum returns the wrong results
search cancel

Greenplum returns the wrong results

book

Article ID: 296554

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This is normal behavior based on engineering's information, but there is a "not exists" workaround if needed. We have two tables that share a text key:
CREATE TABLE table1(key text not null);
CREATE TABLE table2(key text not null);

Table1 contains 11822664 distinct keys and table2 contains 11674868 distinct keys. The result of the following query is zero:
SELECT COUNT(DISTINCT key)
FROM table1
WHERE key NOT IN (SELECT key FROM table2);

Note: table1 has more distinct keys than table2.

Furthermore, the following query returns the correct result:
SELECT COUNT(DISTINCT table1.key)
FROM table1
LEFT JOIN table2 ON table1.key = table2.key
WHERE table2.key IS NULL;

The result is: 5593730

Environment

Product Version: 5.28

Resolution

Workaround

  1. NOT IN” is not equal to "NOT EXISTS" semantically. One of the differences is that, if the inner table contains any NULL tuple, there would be no tuple produced for "NOT IN".
  2. From the perspective of implementation, "NOT IN" is performed by Left Anti Semi Join, while "NOT EXISTS" is performed by normal Anti Join.
  3. For this case, there are some NULL tuples for `tables2.md5`, and that's why "NOT IN" clause does not produce any tuples.
select count(distinct "md5")
from all_keys2 a
where not exists (
select b."md5"
from exclude_keys2 b
where b."md5" = a."md5"

count
---------
5593730