Wednesday, November 02, 2011 5:53 AM
I have been performing research in the area of index optimization for more than 10 years and I'm pleased to announce that a new workload based index advisor is now available in Quest SQL Optimizer 8.5 for Oracle. The first index advisor algorithm I developed is still available in our SQL Rewrite module. This advisor was designed to work on a single SQL statement or a small set of SQL and does not consider workload. It embeds a large amount of human knowledge/rules to narrow down the index candidates in order to speed up the search process. Rule/knowledge based index advisors can work well for small numbers of SQL where the index candidate space is relatively small but become ineffective when used on large SQL populations incorporating composite indexes and where index recommendations are based on actual application workload.
Theoretically speaking, the workload based index selection problem is a NP-complete problem. There are not many workable solutions that can handle thousands of SQL statements at the same time and where the algorithm can balance the workload for each SQL statement. Basically, a database expert cannot solve this problem using their database knowledge alone, and unfortunately, most products in the market design their algorithms to use human assumptions to narrow down the many potential index candidates during the optimization process. Some of this index candidate trimming is good but much is careless, so the outcome of those products are normally inferior to a good DBA.
There are some common problems in most index advisors on the market:
- They cannot provide easy ways to capture SQL workloads.
- They cannot explore all potential index candidates even for a small number of SQL statements.
- They recommend composite indexes with too many columns for the given SQL statement.
- They cannot consolidate different distinguish indexes of a table into one index.
- They cannot automatically adapt to database upgrades and environmental changes.
- They cannot provide a good workload balanced solution for a large number of SQL statements.
- They use fixed index search time without intermediate solutions.
- They recommend solutions that aren't any better than what a good DBA can provide.
Simply speaking, traditional rule based algorithms are not capable of optimizing indexing in today's complex database environments - a new way is needed to handle the index selection problem. After researching this problem for a few years, I finally devised a new Genetic Algorithm to solve it. The algorithm is now patent pending and differs from a traditional Genetic Algorithm that you may find in a text book in that it adds a new crossover and chromosome encoding mechanism with flexible chromosome length. The most difficult part is the size of composite index candidates’ genes pool. Actually, if all composite index candidates are pre-built, the resulting genes pool will overflow. But if I trim it down by my knowledge, the result will be limited by my assumptions and then will be no different from the existing products in the market. I finally created a dynamic construct composite indexes pool plus a non-linear formula to control the probability for each composite index. It sounds too technical here, but I just want to emphasize that no potential candidates or length of candidates are restricted. Let’s come down to earth and show you what we have done in the last few months.
How It Works for One SQL?
Let me first show how it works for one SQL in the following example.
where emp_name not between 'A' and 'K'
and emp_sex = 'M'
and emp_salary < 100000
Click “Optimize Indexes” and define a new SQL workload.
Then select the SQL workload source - we support various sources including Oracle's Automatic Workload Repository (AWR), a Quest Foglight PA Repository, and the SGA which include workloads for each SQL statement extracted. For Scan Code, you can manually input your SQL workload which I will discuss further in my next blog.
For a single SQL statement index recommendation, you can just use Scan Code to capture SQL statements from files or the clipboard. I used Scan from Clipboard to capture the above SQL from my clipboard. Once captured, the R.E.F. (Relative Execution Frequency) is used to adjust the workload of your SQL when there is more than one SQL statement to optimize. Our index advisor will consider the workload of each SQL statement to recommend the best indexes by appropriately weighting them by their contributing workload. For our example single SQL statement, you can leave the R.E.F. value at 1.
Once you have finished your SQL workload setup you reach following Search Process page. You can leave everything defaulted and press the Start button to begin the index search process. Since IO and CPU are the key factors to compose the Cost calculation in the Oracle SQL optimizer, we use Cost as the default Primary goal which provides a relatively balanced search result between CPU and IO.
For a simple SQL statement like this, an optimal result will appear within a minute. You will find that intermediate solutions also appear on the Search Progress screen from time to time. On the Search Progress chart four lines are plotted representing Cost, CPU Cost, IO Cost and Elapsed Time. Normally these four lines will continue dropping along the time axis in a step down manner. It is telling you the latest recommendation’s estimated resource consumption reduction (improvement). Our AI algorithm will continue the search for a better indexing solution until you press the Stop button or your pre-defined stop criteria are met in the Advanced Process Control options. A bar chart on the left hand side of the Search Progress panel shows the estimated improvement for the latest solution.
The Most Sophisticated Index Solutions
I am intentionally using this particular SQL to elaborate our AI engine problem solving ability. You will find that the first solution generated in the first few seconds provides a cost reduction of 53.35%, IO cost reduction of 53.56%, and elapsed time reduction of 53.03%. But this solution also increases the CPU cost by 6.01%.
Let’s take a closer look at this first possible indexing solution. Our algorithm recommended two bitmap indexes which are EMP_NAME and EMP_SEX. It looks reasonable and the cost dropped from the original plan's 5,477 to the new plan's 2,555. But due to the introduction of extra Bitmap Conversion and Hash Join operations in the new plan, the overall CPU cost has increased. So you can see that index recommendation is not as simple as just making use of an index or not, you have to consider whether all the resources being impacted is good for your database if you select this solution.
Now let me explore the second solution that was generated around 20 seconds after starting the search process. In this solution, the estimated resource consumption is reduced significantly. One interesting finding is that this solution’s CPU Cost reduction is the best among all three solutions since the recommended composite index (EMP_SEX,EMP_SALARY) causes the database SQL optimizer to use simple “Table Access by Index Rowid” and “Index Range Scan” operations. If we used CPU Cost as our Primary search goal, the search process would probably not make any further recommendations no matter how long you were willing to wait.
The third and the best solution was generated around 30 seconds after the search process started. You can see the solution presented in the following screen captures. It is a good and reasonable solution for the SQL statement looking for the best estimated Cost reduction. Although there is some sacrifice on CPU cost, this solution provides a better IO Cost reduction than solution 2.
It is easy to overlook the complexity of performing index optimization and I hope you will now have a better understanding of indexing strategy through this example. If there was no such tool in the market, you may never know how many choices you have when selecting index optimization solutions for even one SQL statement. My next blog will discuss optimizing indexing for many SQL statements.