SQL Clause ON versus WHERE

Products

Datacom DATACOM - AD

Issue/Introduction

What is the difference between SQL Clause ON and WHERE?

In relation with LEFT JOIN or/and INNER JOIN

Predicates can be placed in both the ON clause of an INNER or LEFT join, and in the WHERE clause.

So what difference does it make where a predicate is placed?

Environment

z/os, VSE, CA Datacom/SQL 14.0 and above

Resolution

INNER JOIN

For INNER join, there isn't much difference when one of the predicate operands references the right-side table of the ON clause. But if the predicate references only tables accessed previously, then this can be much less efficient because the predicate may not be able to restrict index scan range on the referenced table.

LEFT JOIN

For LEFT join, the semantics are different. In the ON clause, if the predicate evaluates to FALSE for a row, the left side table row is preserved and the right side table column(s) set to the null value. In the WHERE clause, the row is rejected.

For example:

 	SELECT * 	
 	FROM T1 LEFT JOIN T2 ON T1.COL1 = 10 AND T1.COL2 = T2.COL2;

One might expect the result set to only contain rows with T1.COL1 equal to 10, but this is not true. Rows with T1.COL1 not equal to 10 are included with T2 column(s) set to the null value.

When a predicate in an ON clause of a LEFT join does not reference the right-side table, but only tables to the left, then the predicate will not be applied when those tables are being accessed because the predicate must affect only the current join. So this can be much less efficient since index scan range is not restricted.

WHERE

Be careful that predicates in the WHERE clause do not reference right side tables of a LEFT join. The intention of the LEFT join is to preserve the left side table's rows. But since the rows that are preserved (because there is no matching right side table row) have the null value, they are rejected by the predicate in the WHERE clause. So the LEFT join is effectively turned into an INNER join!

For example:

 	SELECT *       
   FROM T1 LEFT JOIN T2 ON T1.COL1 = T2.COL1 
   WHERE T2.COL1 = 10;

Any rows preserved when T2.COL1 is not the same as T1.COL1 are rejected because any null value for T2.COL1 in the preserved row causes the T2.COL1 = 10 predicate to be UNKNOWN, and only TRUE rows are returned in the result set.

If you want to find only rows where there is no matching row, use a LEFT join, and then reference a column of the non-preserved table with the IS NULL predicate in the WHERE clause:

 	SELECT *       
   FROM T1 LEFT JOIN T2 ON T1.COL1 = T2.COL1 
   WHERE T2.COL1 IS NULL;

Now, any matching rows will not be null and be rejected, leaving only rows where there is no matching T2 row. This assumes that there are no rows stored in T2.COL1 with the null value, of course.

In general, it is best to only include join conditions for the current join in an INNER or LEFT join ON clause, and place all other predicates in the WHERE clause.

One last note/hint: If you need to control the join sequence, convert

 	FROM T1, T2, T3

into

 	FROM T1 INNER JOIN T2 INNER JOIN T3

The INNER join syntax forces the join order as written from T1 to T2 to T3.