Toad World blogs are a mix of insightful how-tos from Quest experts as well as their commentary on experiences with new database technologies. Have some views of your own to share? Post your comments! Note: Comments are restricted to registered Toad World users.
Do you have a topic that you'd like discussed? We'd love to hear from you. Send us your idea for a blog topic.
Thursday, April 07, 2011 4:50 AM
Written by Richard To
For Part I & Part II, please visit:
A Data Warehouse (DW) is a database system used in an organization to collect data from one or more data sources and is typically used to generate reports or run ad-hoc queries that aggregate data for business analysis purposes. A DW generally consists of an ETL
(Extract, Transform, and Load) tool, a database, a reporting tool and other facilitating tools such as a data modeling tool. The nature of a DW is to store large quantities of data and utilize numerous B-tree indexes, bitmap indexes, views and materialized views to facilitate business professionals performing data mining, online analytical processing, market research and decision support.
In order to better utilize Quest SQL Optimizer features, I will categorize SQL into the following types with the corresponding method to shorten the time to tune a DW SQL.
ETL and Distributed SQL Statements Tuning
Extract, transform and load SQL statements that use distributed queries to consolidate data from multiple databases through dblinks may suffer from significant performance degradation if the driving site is not selected correctly by the database SQL optimizer.
To enable the use of the Driving Site hint in Quest SQL Optimizer for Oracle, you have to set the Optimize Intelligence using the Options dialog in one of the following ways:
- Use the predefined setting Intelligence Level 4 or higher
- Enable Custom Settings and then manually enable the hint on the Optimization page as follows:
Due to the high complexity of DW SQL statements, I recommend you manually enable the Driving Site hint if you have a SQL with dblinks. It will significantly reduce the time to test run too many alternatives using Intelligence Level 4. I have found that most of the problems with dblink SQL statements are simply caused by the wrong driving site selection by the database SQL optimizer.
The following is a simple example that compares a local database Employee table with a remote (QSO) Employee table. The original plan on the left hand side shows that a Hash Join Anti is executed in the local database and remote SQL execution is sent to QSO database. You can see the Cardinality of this remote SQL statement is up to 2,399,992. That means that if the SQL is processed in the local database there will be up to 2,399,992 emp_ids sent back from the remote database. Network delay is unpredictable and I believe that Oracle is not taking it into consideration. You can see the same table Employee being retrieved locally or remotely has similar cost in both red boxes on left and right hand sides. You will find that adding a driving site hint in Alt8 causes Oracle to use the Remote QSO database as the driving database thus reducing the number of emp_ids sent over the network by 4 times (599,998 sent to remote database). Since count(*) is the only result needing to be sent back to the local database, the returning traffic is minimal.
where emp_id not in (select emp_id
Join Path Control with Driving Site Hint
The following SQL involves three databases which are Local, QSO and Win11gr2. A driving site hint has been added to SQL Alt2 to force the SQL to be processed first by the Win11gr2 database and the resulting execution plan is now changed to join Local’s Employee table and Win11gr2’s Employee table. The problem in the original plan is that although the Win11gr2’s Employee table can quickly filter out a lot of records from the Local’s Employee table, it was unfortunately arranged as the last step (in the red box of original execution plan on the right hand side). You can see that the driving site hint not only controls which database is going to process your SQL statements, but also influences the join path of your SQL statements. So the fun of tuning your ETL or distributed SQL statements can be thought of as a game to reach a balance between network traffic overhead and join path speed.
where emp_id in (select emp_id
and emp_id not in (select emp_id
I will discuss the following topics in coming series of “How to Use Quest SQL Optimizer to Tune Data Warehouse SQL ”
- SQL with bind variables and hardcoded literals
- Data warehouse indexing
- SQL fired from a tool