In some of our (web based - java) application programs we have a need to run a query that has an extensive number of OR predicates similar to below. If the number of OR predicates exceeds some amount, the query becomes too big for the RWTSA, which we have set to the maximum size (approx 60K). Since the query is constructed 'on the fly' in the program and it's size depends on other query results, the number of OR predicates is not known in advance. The applications programmer solution was to limit the number of ORs to some number (around 50, I believe) and if there were more than this number, add the rest using an IN predicate.
The selection predicates are usually indexed, so the rows are returned quickly if the IN predicate is not needed. However, if it is needed, the process (appears to) use data scanning and possibly a full table traversal. Since some of these tables are large, the execution time gets very elongated or we get 91(106) CBSIO exceeded errors.
My question is: Is there an alternative way of coding the query that is both efficient and not dependent on a max number of predicates?
SELECT FIELD1, FIELD2, FIELD3
FROM TAB1 WHERE KEY-FIELD1 = 'XXX'
AND KEY-FIELD2 = 'YYY'
AND (KEY-FIELD3 = 'A'
OR KEY-FIELD3 = 'B'
OR KEY-FIELD3 = 'C'
OR KEY-FIELD3 = 'X'
OR KEY-FIELD3 = 'Y'
OR KEY-FIELD3 = 'Z') ;
How to get ORed predicates to restrict index scan range when there are more than will fit in the RQA. When there are too many, the predicates aren’t passed to CBS. So if they were restricting an index, now the whole table must be scanned.
CA Datacom/SQL, DBSRVR (Web base - Java)
Here’s the best solution to this, which might help other customers:
When a query has many ORed predicates ANDed with several other predicates, the Request Qualification Area (RQA) that is passed to the Compoun Boolean Selection Facility (CBS) becomes large because every ANDed predicate is duplicated for each OR predicate. This is necessary to provide independent paths for selecting the best index to use.
If the size is higher than the RQA limit, then the predicates are not passed to CBS. If the predicates can restrict the traversal index scan range (into a range for each OR value), that is much more efficient than scanning the entire table.
You can create smaller RQAs within the size limit, by dividing the ORs into multiple subselects using UNION ALL. The RQA for each subselect will be smaller, and the UNION ALL doesn't cause a sort to eliminate possible duplicate rows, as using just UNION does. Sorting the ORed values might eliminate the need for an ORDER BY sort.
SELECT * FROM T1 WHERE A = 1 AND B = 2 AND (C = 1 OR C = 2 OR … C = 98 OR C = 99);
SELECT * FROM T1 WHERE A = 1 AND B = 2 AND (C = 1 OR C = 2 OR … C = 49 OR C = 50)
SELECT * FROM T1 WHERE A = 1 AND B = 2 AND (C = 51 OR C = 52 OR … C = 98 OR C = 99);