In old version of Oracle database, it is easy to control the join path by reordering the tables in table list after the FROM clause, but it is getting difficult to control the join path in today’s cost based SQL optimizer, I am going to introduce method that is applicable in most databases such as Oracle, Sybase, DB2 and SQL Server in the following:
To control a join path, we cannot tell the database SQL optimizer which path is the best one to select. Instead, we add something to the syntax of the SQL statement that causes an increase to the cost of the current join path selected by database SQL optimizer. Let’s take a look at the following example of two table join scenarios.
| SQL Text |
Paths |
|
select * from A,B
where A.key = B.key
|
 |
If we consider the Nested Loop join, two paths can be considered by database SQL optimizer, which are A→B and B→A. After the cost estimation, the database SQL optimizer may think that B→A has a lower cost, so, the database SQL optimizer will select the join path of B→A. If we know that the join path selected by database SQL optimizer is not the optimal path, we should be given an opportunity to influence the database SQL optimizer to select another path. For some databases, like Oracle, if you know which path is the best, you can use the Optimization Hints to influence the database SQL optimizer to pick the right path, but in some situations, your program language may not allow you to embedded Optimization Hints as a comment to send to Oracle, or in some databases, such as IBM DB2 UDB, there are no optimization hints available. Therefore, the rewriting of the SQL syntax is the only way that can be used to influence the database SQL optimizer to pick the right path. Let us rewrite the following SQL syntax and assume that the datatype for A.key and B.key is numeric.
|
select * from A,B
where A.key + 0 = B.key
|
In the Oracle and Sybase Adaptive Server, the index search of A.key on table A is disabled by changing the syntax to A.key + 0. The addition of +0 does not affect the value of A.key, but it does cause a full table scan on A table. The cost estimation of the join “from table B to index search table A” will be artificially increased by this new syntax and if the new cost is higher than the path of A→B, the database SQL optimizer will pick up the execution plan of “from table A to index search table B”.
In a real life situation, the database SQL optimizer may not actually select the expected Nested Loop execution plan “from table A to index search table B” when you change the syntax to A.key + 0. This syntax change would result in increasing the cost of the original Nested Loop execution plan “from table B to index search table A”, but the database SQL optimizer may select second lowest cost execution plan which could be a Hash Join or a Sort Merge instead of the Nested Loop execution plan.
For Microsoft SQL Server and IBM DB2 UDB, since the database internal SQL rewriter can eliminate the redundant “+0” operation, the following syntax should be used to increase the cost of a specific driving path.
|
select * from A,B
where coalesce(A.key,A.key) = B.key
|
Remark
Since IBM DB2 UDB v8 and Microsoft SQL Server 2005 have stronger internal SQL rewrite abilities in their latest versions, the coalesce(col,col) can be resolved by the database SQL optimizer during parsing to a Case operation. Therefore, the index that you are trying to disable will remain in the execution plan. A deeper nested coalesce(coalesce(col,col),col) can be used to overload the parser and increase the specific cost weighting.