Hello, you are not logged in.  Login or sign up
Community >> Blogs
Search Toad World Search

Do you have a topic that you'd like discussed?  We'd love to hear from you.  Send us your idea for a blog topic.

Why Join Path Matters
 
Location: Blogs Richard To's Blog    
 RichardTo Monday, November 13, 2006 12:28 PM
The Nested Loop join operation is the basic join operation which is supported by most RDBMS, since it requires less memory and less temporary space. Normally, it can provide faster data response time than other join operations. But, the path of a Nested Loop join will significantly affect the speed of the join operation. Let’s use a two table join as an example to understand how this works.

select * from A,B
where A.key = B.key

 

Let’s assume that A.key and B.key are unique B-Tree indexed
(assume B-tree has two nodes for each parent)
A table has 100,000 records
B table has 1000 records

If we focus on the Nested Loop join operation and assume these two tables are cached in memory, let’s calculate the number of operations to retrieve both tables for the two possible join paths:

Path from table A to table B: which means that we open table A, looking at each row to then use an index to search for matching rows in table B:

Number of Operations (AB) = 100,000 * RoundUp(LN(1000) / LN(2)) / 2 = 100,000 * 10 / 2 = 500,000

Where LN(1000)/LN(2) is the height of the B-tree for index of B.key, half the height is assumed as an average searching depth for a specific record from B.key.

Path from table B to table A: which means that we open B table, looking at each row to then use an index to search for matching rows in table A:

Number of Operations (BA) = 1000 * RoundUp(LN(100,000) / LN(2)) / 2 = 1000 * 17 / 2 = 8,500

Where LN(100,000)/LN(2) is the height of the B-tree for index of A.key, half the height is assumed as an average searching depth for a specific record from A.key.

According to the calculation, you will find that the path from BA is around 500,000/8,500 ~ 59 times faster than the speed of AB. This explains why some SQL statements with a wrong driving path can be tuned up to hundreds of times faster. A lot of programmers may have learned from experience that normally to use the small table to drive a bigger table for Nested Loop join produces faster results. When a SQL statement is simple in natural, it is likely that a programmer will write the SQL statement using the best driving path. But in a real live application, the SQL statements can be far more complicated than a simple 2 table join operation. For example, the key for both tables may not be unique. There may be some filtering criteria for both tables that make it so that no histogram can be referenced or the histogram’s granularity is too big to accurately reflect the truth. These make it so that the database SQL optimizer cannot accurately estimate the cost of each join path. Human expertise is needed to solve the problem when the database SQL optimizer chooses a poor execution plan.  Furthermore, for many tables join situation, the number of permutation is growing exponentially that cannot be investigated by database SQL optimizer in real time, for example, I came across a production SQL statement with 13 tables join from a customer, there is a Company Code that almost all tables are referencing like this:

Select * from A1,A2,A3,….,A13
Where A1.com_code=A2.com_code
And A2.com_code=A3.com_code
And ….
And A12.com_code=A13.com_code

So, the possible driving paths will be up to 13!= 6227020800, I believe that database internal SQL optimizer pick up a random path and run the SQL right away will be much faster than investigating all possible paths.  So, it is a gamble of the cost you pay for SQL optimization and the time you can save to run the SQL after SQL optimization.  So, database vendors normally will bet on both sides with little wager in this gamble by searching limited paths.

Up to this moment we are focusing on discussion of the total cost of returning all records from a Nested Loop join, for response time of returning partial records from a SQL; a Nested Loop join is a good choose among all other join methods, since it is no need to wait for sort, merge or hash process to complete before it can return records. 

 For other join methods, such Hash Join or Sort Merge, the join path may not be as significantly as Nested Loop join. But in some situations, like those illustrated in Figure 1 and Figure 2 in my last blog, you may find that the join path still plays a major role in the performance of a SQL statement.

Next month – “How to control the path of two tables join”

Copyright ©2006 Quest Software Inc.
Permalink |  Trackback

Comments (2)   Add Comment
By GTM123 on Friday, November 24, 2006 11:02 AM
Thanks, this is great info for tuning Oracle queries; you might want to point out that a coder will need to over-ride the CBO to use this technique.

By RichardTo on Sunday, November 26, 2006 6:00 PM
Yes, in certain situations, coders need to guide database optimizer to pick up a better query plan, at this moment, human still better than machine.
Give you one more question; is it easy for a machine to accurately calculate the cost of a sort of a table?
The answer is no, since the number of operations of a sort will depend on the natural order of those records stored on the table, the more those records are in order, the faster the sort will be.


Comment:
Add Comment   Cancel 
Search Blog Entries
 
Copyright 2008 by Quest Software  | Terms Of Use | Privacy Statement | Contact Us