The DB2 optimizer can choose from a variety of different techniques as it creates optimal access paths for each SQL statement. These techniques range from a simple series of sequential reads to much more complicated strategies such as using multiple indexes to access data. In this 2 part series of articles we will learn about some of the most popular techniques used by the optimizer to formulate DB2 access paths.
 
Scans
 
Of the many decisions that must be made by the optimizer, perhaps the most important decision is whether an index will be used to satisfy the query. Before it can determine this, the optimizer must first determine whether an index exists. Remember, you can query any column of any table and an index is not required to do so. So, the optimizer must be able to access non-indexed data; it does this using a scan.
 
The preference of the DB2 optimizer, in most cases, is to use an index. This is true because an index greatly optimizes data retrieval. However, an index cannot be used if one does not exist. And certain types of SQL statements simply are best satisfied with a complete scan of the data. For example, consider the following SQL:
 

SELECT * FROM EMP;

 
Why should DB2 attempt to use an index on this statement? There is no WHERE clause, so a complete scan is best. Furthermore, even when a WHERE clause is specified the optimizer may determine that a sequential scan of pages will outperform indexed retrieval – and therefore an index may not be chosen.
 
Why would non-indexed access outperform indexed access when the primary reason indexes exist in the first place is to improve performance? Well, it is possible that indexed access can be slower than a simple scan. For example, a very small table may have only a few pages. Reading all of the pages may be faster than first reading the index page(s) and then reading the data page(s). Even for larger tables, the organization or number of levels of the index could require additional I/O to satisfy the query in some cases. When an index is not used to satisfy a query, the resulting access path uses a table scan (or table space scan).
 
A table scan generally reads every page of the table. In certain circumstances, though, DB2 can be smart enough to limit the pages that are scanned. Additionally, DB2 can invoke sequential prefetch to read pages before they have been requested. Sequential prefetch is particularly useful when a SQL request needs to access rows sequentially, in the order in which they are stored on disk. The optimizer can signal that sequential prefetch should be used when it determines that the query will sequentially read data pages. Table scans frequently can benefit from the read ahead processing of sequential prefetch because the data already will be in memory when it is requested by the query.
 
Indexed Access
 
Generally speaking, the fastest way to access DB2 data is to use an index. Indexes are structured in such a way as to increase the efficiency of finding a particular piece of data. Figure 1 shows the structure of a b-tree index. You can see that by simply traversing the index from the root to the leaf page you can quickly find the appropriate data page where your requested data exists. But, the manner in which DB2 uses an index varies from statement to statement. DB2 uses many different internal algorithms to traverse an index structure.
 
Figure 1. The Structure of a b-tree Index
 
Before DB2 will use an index to satisfy a data access request, the following criteria must be met:
  • At least one of the SQL predicates must be indexable. Certain predicates are not indexable by their very nature and therefore the optimizer will never be able to use an index to satisfy them.
  • One of the columns (in any indexable predicate) must exist as a column in an available index.

 

So, you see, the requirements for DB2 to consider using an index are quite simple. But there is much more learn about indexed access in DB2. There are actually different types of indexed access.
 
The first, and most simple, type of indexed access is the direct index lookup. For a direct index lookup, DB2 will start at the top using the root page of the index and follow down through the intermediate leaf pages until it reaches the appropriate leaf page. There, it will read the pointer to the actual data page. Based on the index entries, DB2 will read the correct data pages to return the desired result. In order for DB2 to perform a direct index lookup, values must be provided for each column in the index. For example, consider an EMPLOYEE table with an index on the DEPTNO, TYPE, and EMPCODE columns. Now consider the following query:
 
SELECT   FIRSTNAME, LASTNAME
FROM     EMPLOYEE
WHERE    DEPTNO = 5
AND      TYPE = ‘X’
AND      EMPCODE = 10;
 
If only one or two of these columns were specified, a direct index lookup could not occur because DB2 would not have a value for each column and could not match the full index key. Instead, an index scan could be chosen. There are two types of index scans: matching index scans and non-matching index scans. A matching index scan is sometimes called absolute positioning; a non-matching index scan is sometimes called relative positioning. Recall our earlier examination of table scans? Index scans are similar. In an index scan the leaf pages of the index are read sequentially.
 

A matching index scan begins at the root page of an index and works down to a leaf page in much the same manner as a direct index lookup does. However, because the complete key of the index is unavailable, DB2 must scan the leaf pages using the values that it does have, until all matching values have been retrieved. Consider a re-write of the previous query, this time without the EMPCODE predicate:

SELECT   FIRSTNAME, LASTNAME
FROM     EMPLOYEE
WHERE    DEPTNO = 5
AND      TYPE = ‘X’;
 

A matching index scan can locate the first leaf page with the appropriate value for DEPTNO and TYPE by traversing the index starting at the root. But there can be multiple index entries with this combination of values and different EMPCODE values. Therefore, leaf pages to the right are sequentially scanned until no more valid DEPTNO, TYPE, and varying EMPCODE combinations are encountered.

For a matching index scan to be requested, you must specify the high order column in the index key, which is DEPTNO in the preceding example. This provides a starting point for DB2 to traverse the index structure from the root page to the appropriate leaf page. But what would happen if you did not specify this high order column? Suppose that you alter the sample query such that a predicate for DEPTNO is not specified:

 

SELECT   FIRSTNAME, LASTNAME
FROM     EMPLOYEE
WHERE    TYPE = ‘X’
AND      EMPCODE = 10;
 
In this instance, a non-matching index scan can be used. In this case DB2 cannot use the index tree structure because the first column in the key is unavailable. A non-matching index scan begins with the first leaf page in the index and sequentially scans subsequent leaf pages, applying the available predicates. The root page and any intermediate leaf pages are not used.
 
A special type of index scan is known as index only access. DB2 can avoid reading data pages completely if all the required data exists in the index. For example:
 
SELECT   DEPTNO, TYPE
FROM     EMPLOYEE
WHERE    EMPCODE = 10;
 
Remember, we have an index on the DEPTNO, TYPE, and EMPCODE columns. In the previous query these are the only columns requested. Therefore, DB2 need never access the table because all of the data exists in the index.
 
Another type of indexed access that can be used by DB2 is multiple index access. Multiple index access will use more than one index for a single access path. For example, consider a query against our employee table where we only have two indexes: IX1 on the EMPNO column and another, IX2, on the DEPTNO column. The following query is then requested to show which employees work in a certain department:
 
SELECT   LASTNAME, FIRSTNME, MIDINIT
FROM     EMPLOYEE
WHERE    EMPNO IN ('000100', '000110', '000120')
AND      DEPTNO = 5;
 
Should DB2 use IX1 for the EMPNO predicate or IX2 for the DEPTNO predicate? Why not use both of them? This is the essence of multiple index access. There are two types of multi-index access, depending on whether the predicates are tied together using AND or OR.
 
Summary
 
In this first part of our 2-part series on DB2 optimization and access paths we uncovered the basics of optimization and discussed indexed access versus scanning data. Next month, the second part of this series will delve into join methods.