Minimize
Blogger List

Johannes Ahrends
Toad and Oracle

Ben Boise
Toad SC Discussions

Kevin Dalton
Benchmark Factory

Steven Feuerstein
PL/SQL Obsession

Devin Gallagher
Toad SC discussions

Stuart Hodgins
JProbe Discussions

  Henrik "Mauritz" Johnson
Toad Tips & Tricks on the "other" Toads
  Mark Kurtz
Toad SC discussions
  Michael Lumbard
Toad SC discussions
Daniel Norwood
Toad for Data Analysts,
Toad Extension for Visual Studio
Debbie Peabody
Toad for Data Analysts
Gary Piper
Toad Reports Manager
John Pocknell
Toad for Oracle, JProbe
Kuljit Sangha
Toad SC discussions
Bert Scalzo Indicates Oracle ACE status
Toad for Oracle, Data Modeling, Benchmarking
Jeff Smith
Toad product family
Richard To
SQL Optimization
Jim Wankowski
DB2 - LUW and z/OS
John Weathington
  Toad Data Modeler Opens in a new window
Data Modeling
 
  Real Automated Code Testing for Oracle
Quest Code Tester blog
 
Minimize
Blog Tags
toad for oracle (122)
oracle (62)
plsql (46)
sql optimization (37)
toad for data analysts (28)
code tester (19)
toad for ibm db2 (13)
automation (11)
batch optimizer (10)
virtualization (10)
schema browser (9)
toad for sql server (9)
data grid (8)
sql (8)
sql editor (8)
toad data modeler (8)
benchmark factory (7)
excel (7)
query builder (7)
report manager (7)
toad extension (7)
visual studio (7)
11g (6)
configuration (6)
freeware (6)
health check (6)
vmware (6)
connect (5)
dba module (5)
er diagrammer (5)
F4 (5)
linux (5)
refactoring (5)
spotlight (5)
unicode (5)
compare (4)
debugger (4)
export (4)
formatter (4)
make code (4)
rman (4)
strip code (4)
benchmark (3)
bfscript (3)
bulk collect (3)
code templates (3)
code xpert (3)
database browser (3)
db2 (3)
notebook (3)
oem (3)
RAC (3)
session browser (3)
speed (3)
sql optimizer (3)
toad for mysql (3)
tpc-c (3)
9.7 (2)
alert log (2)
app designer (2)
awr (2)
code insight (2)
code snippets (2)
collection (2)
compare and sync (2)
compliance (2)
data generator (2)
data warehouse (2)
database explorer (2)
database monitor (2)
explain (2)
forall (2)
ftp (2)
group execute (2)
handbook (2)
installation (2)
job scheduler (2)
multi-task (2)
nested table (2)
os command (2)
profiler (2)
recovery (2)
release history (2)
save as (2)
schema compare (2)
sql recall (2)
stats pack (2)
subversion (2)
team coding (2)
trace file browser (2)
while loop (2)
10g (1)
64 bit (1)
7zip (1)
action (1)
addm (1)
alter (1)
ansi join (1)
array (1)
ccleaner (1)
code coverage (1)
code road map (1)
CRON (1)
cursor for loop (1)
data browser (1)
data subset (1)
database probe (1)
dbms_flashback (1)
dbms_profiler (1)
ddl (1)
feuerstein (1)
filezilla (1)
flash drive (1)
flow control (1)
for loop (1)
group policy manager (1)
hints (1)
import (1)
index (1)
inheritance (1)
invoker rights (1)
ipad (1)
java (1)
latency (1)
log switch (1)
logical model (1)
ltrim (1)
master-detail browser (1)
monitor (1)
multi-select (1)
naming standards (1)
network (1)
object explorer (1)
OEBS (1)
package (1)
parser (1)
partitioning (1)
performance (1)
pragma (1)
project manager (1)
RAT (1)
revo (1)
REXEC (1)
schema report (1)
script manager (1)
search (1)
set operator (1)
sga (1)
slow (1)
sonarsource (1)
source control (1)
space projection (1)
sql monitor (1)
sql navigator (1)
sql script (1)
sql tracker (1)
sql*plus (1)
standards (1)
statistics (1)
stored procedure (1)
string parser (1)
sub-model (1)
sub-type (1)
synch (1)
synchback (1)
TELNET (1)
toad (1)
trace (1)
unit test (1)
unix (1)
usb (1)
utility (1)
v10 (1)
v9.5 (1)
version control (1)
waits (1)
workload replay (1)
workspace (1)
xml (1)
 
WELCOME, GUEST
 
 

Blogs
Toad and Database Commentaries

Toad World blogs are a mix of insightful how-tos from Quest experts as well as their commentary on experiences with new database technologies.  Have some views of your own to share?  Post your comments!  Note:  Comments are restricted to registered Toad World users.

Do you have a topic that you'd like discussed?  We'd love to hear from you.  Send us your idea for a blog topic.


Mar 13

Written by: Richard To
Friday, March 13, 2009 4:11 AM  RssIcon

This blog is the first in a series about misconceptions surrounding SQL tuning that are quite common. The first one covers the misconception that you can use the estimated cost from the database SQL optimizer to accurately judge the performance of a SQL statement in comparison to its rewrites.
 
SQL tuning is a very interesting topic and most DBA or developers have at least some experience with tuning SQL. They learn some tuning concepts from different sources such as the internet, tuning books, or their colleagues. There is no single scientific tuning guideline or method in the field.   This gives us the impression that SQL tuning is more of an art than scientific knowledge. I have been in this field for almost 20 years and have experience doing SQL tuning on at least 4 database platforms. Basically, I have found that each database internal optimizer’s behavior is different from the others. For example, IBM LUW(Unix) has a very strong internal rewrite that can transform your SQL to a relative standardized internal syntax in which you don’t have any control over the ultimate query plan’s generation. In contrast, the Oracle SQL optimizer is the most syntax sensitive SQL optimizer in the market which allows you to influence this optimizer to select a better plan. Microsoft SQL Server and Sybase came from the same architecture in the beginning. Consequently, their SQL optimizers behave similarly, but now they are getting more diverse from version to version.
 
I cannot say which database platform’s approach to SQL optimization is better, but what I can tell you is that Oracle is the most aggressive and open database SQL optimizer among all databases. I believe the architects of Oracle are the first few people who admit the limitation of their SQL optimization technology. They do not pretend to provide you with a mighty database internal SQL optimizer Instead they provide you Hints, SQL Outlines, SQL Profiles and even the latest SQL Plan Baselines features to help you to rectify their SQL optimization problems. Actually, this trend is getting popular. Now, every database platform provides at least some features for you to influence or freeze a SQL query plan.
 
It is no doubt that SQL optimization is still one of the challenges of today’s database research. We are still suffering from bad performance SQL from day to day. Since there are many common misconceptions floating around SQL optimization, I would like to make some clarifications in this corner and hope it will be helpful for you while you are tuning your SQL statements.
 
First Misconception:

The estimated cost is a good way to judge the performance of a SQL and its rewrites

It is quite a common misconception that people should use the cost to judge a SQL statement’s performance. Statistically, if we are talking about hundreds of SQL statements, this judgment is correct. For example, if we have a pool of two hundred SQL statements that is running on the same database and if we select the set of one hundred SQL statements with highest estimated cost from this pool and compare the execution time of the set with the other set of one hundred SQL statements with lowest cost from the same pool, the result is probably as expected, the set with the lowest cost will be the better performing set. (This explanation is assuming that there is no one SQL statement that dominates the total execution time.) But if you randomly select two SQL statements to compare their cost and execution time, unless the cost is significantly different up to multiple times, it will be hard to tell which SQL will run faster just by the cost estimation. The reason for this is that the cost estimation algorithm of the database SQL optimizer is not as accurate as what we expect it to be. There are many reasons for why it is not accurate, especially for complex SQL statements. The most common errors found in cost estimations are filtering and join cardinality estimation.

Here is an example to demonstrate the problems a database optimizer will encounter.
select emp_name
  from employee, manager
 where emp_mgr_id=mgr_id
   and mgr_name like “% Peter %” 
This SQL extracts all employees’ names whose reporting manager’s name is like “% Peter %”. Since there is no information for the database SQL optimizer to estimate how many managers will have a name like “% Peter %” before the execution, the SQL optimizer will fail to make an accurate estimation for this filtering step in the query plan which is at the first stage of the plan. It will create an arbitrary number at this stage. Another problem is the join cardinality estimation. In a very ideal situation where the emp_mgr_id is equally distributed in the employee table, the SQL optimizer can use the emp_mgr_id histogram to roughly estimate the matched records number, but the accuracy will depend on the arbitrary number created in the first stage. In contrast, in a real life situation, the emp_mgr_id distribution may be skewed in the employee table, so, it will only be by luck that the SQL optimizer can provides an accurate cost estimation of the join cardinality.
 
Let me give you another more mathematical example with the following:
select *
 from A,B
 where A.f1 = B.key 
In this example, the database SQL optimizer has no problem in estimating an accurate cardinality of the join result, since both tables are retrieved without filter criteria. Theoretically, the database optimizer shouldn’t have any problem to do a 100% accurate cost estimation, but a correct cardinality estimation doesn’t mean that you will have a 100% accurate cost estimation. Why?  
 
To simplify the discussion, let me assume the SQL optimizer can generate only two query plans for this SQL. The first query plan is a Nested Loop Join that uses table A to index search of table B using B.key. The other plan is a Sort Merge Join, in which table A and table B are sorted and then merged together. Let’s further simplify the discussion by assuming both tables have the same structure and the same number of records N.
 
The cost of the Nested Loop is ~ N*log2N/2 ~ O(NlogN),
where log2N/2 is the average depth of the a balance B-tree index and assuming that two nodes for each parent node.
 
The cost of Sort Merge will be in the order of O(NlogN).
 
So, you will find that both join methods have similar orders of speed. The difference will be the scale and cost of the individual operation. Let’s further simplify the situation and assume that the scale and the operation cost are the equal for both join methods, and see whether we can distinguish which join method will use less resource or run faster. The answer is still “not 100% sure!” Why?
 
You may be aware that the sort algorithm speed depends on the input data’s natural order. An ordered data will be a benefit to a Merge sort, but it will be a disadvantage to a Quick sort. It is quite common that the data in a database has at least some sort of order that is created by the users’ operation behavior. So, even though I made a lot of assumptions and tried to make everything isolated, I still cannot 100% accurately estimate the cost of a Nested Loop join and a Sort Merge join for this simple SQL statement, of course, I am talking about 100% accurate estimation here. But for the commercial database, we don’t need that rigid of a requirement, maybe 70% accuracy will be good enough, but please remember that this accuracy rate is not a constant; it will get lower in proportion as your SQL complexity gets higher.
 
Conclusion
 
The above mentioned examples are only part of the reasons that the database optimizer cannot carry out accurate cost estimation for your SQL statements. There are many other reasons that have not been discussed here. So, next time when you tune SQL statements, you will get the best result if you test run your rewrites or create physical indexes instead of relying on the virtual indexes cost estimation to judge which SQL rewrite or new indexes are better. I put this as the first discussion topic, since it is a very common mistake, and even some senior members in our team have this misconception sometimes!

Search Blog Entries