Toad World blogs are a mix of insightful how-tos from Quest experts as well as their commentary on experiences with new database technologies. Have some views of your own to share? Post your comments! Note: Comments are restricted to registered Toad World users.
Do you have a topic that you'd like discussed? We'd love to hear from you. Send us your idea for a blog topic.
Monday, April 02, 2007 12:27 PM
In Computability theory there is a famous decision problem called halting problem which can be informally stated as follows:
Given a description of a program and its initial input, this determines whether the program, when executing the input, will ever halt (complete). The alternative is that it runs forever without halting (stopping).
In 1936, Alan Turing proved that a general algorithm to solve the halting problem for all possible inputs cannot exist. It is said that the halting problem is undecidable over Turing machines.
Perhaps you find that the halting problem is similar to one of RDBMS SQL optimization problems. Actually, there are two major problems that modern RDBMS SQL optimizers encounter today. The first one is the limited size of the plan space (the number of execution plans can be investigated during SQL optimization). Due to the fact that the database SQL optimizer has to do real time optimization, it is impossible for database SQL optimizer to do an exhaustive plan space search (search all possible execution plans internally), otherwise the optimization time will be much longer than the time it takes to execute the SQL statement even with a bad execution plan. The second problem is the accuracy of cost estimation algorithm. After the database SQL optimizer has generated all the internal SQL rewrites and their corresponding execution plans, the database SQL optimizer uses the cost estimation algorithm to choose the theoretical best execution plan, the one with the lowest cost, to execute. The problem is similar to the famous halting problem in Computability theory. Of course, the logic embedded in a SQL statement is much less than that of a program mentioned in the Computability theory. So, we still can use tables, indexes, histograms, assumptions, and other statistics to estimate the cost of an execution plan, but the problem now is no longer to tell whether a SQL/program will halt or not. We are facing a more difficult problem, that is, how long a query(with or without inputs) will run with a specific execution plan? Database vendors have spent a lot of effort in this area, but the fact remains that we still have to tune SQL statements ourselves.
Accurate Cost Estimation Versus Plan Space
The goal of a good database SQL optimizer is not only to provide accurate cost estimation for SQL statements, but it should generate more internal execution steps to compose more execution plans. More internal execution plans means that the database SQL optimizer has a bigger plan space during SQL optimization.
Here is an example to show you the relationship between plan space and cost estimation. Consider how you travel to your office and suppose you only have one route to go to the office. If the only way that you go to the office is jammed due to the weather or traffic conditions, you probably will not be able to get to your office on time. Consequently, most people have multiple routes (plan space) in mind. Every morning, based on the weather and traffic conditions, they will select the best route (cost estimation) to office. The more routes they have in mind, the higher the possibility that they can overcome more complex traffic and weather conditions. The point is, with more possible execution plans, every morning they will spend more time thinking about which path is the best way to go to their office. As the number of routes increases, the chance that they select a non-optimal path gets higher. This problem is similar to what the database SQL optimizer faces. The accuracy of the database SQL optimizer’s cost estimation is opposite to the size of plan space that the database SQL optimizer can generate. The bigger the plan space, the easier it is for the optimizer to select a non-optimal execution plan. That is why you will find Oracle’s SQL statement performance always has room for improvement, since Oracle has a relative larger plan space and it is more sensitive to your SQL syntax.