When it comes to assuring optimal performance of DB2 applications, coding properly formulated SQL is an imperative. Most relational experts agree that poorly coded SQL and application code is the cause of most performance problems – perhaps as high as 75 to 80 percent of poor relational performance is caused by “bad” SQL and application code.

But writing efficient SQL statements can be a tricky proposition. This is especially so for programmers and developers new to a relational database environment. So, before we delve into the specifics of coding SQL for performance, it is best that we take a few moments to review SQL basics.

The Basics

SQL, an acronym for Structured Query Language, is a powerful tool for manipulating data. It is the de facto standard query language for relational database management systems and is used not just by DB2, but also by the other leading RDBMS products such as Oracle, Sybase, and Microsoft SQL Server.

SQL is a high-level language that provides a greater degree of abstraction than do procedural languages. Most programming languages require that the programmer navigate data structures. This means that program logic needs to be coded to proceed record-by-record through data elements in an order determined by the application programmer or systems analyst. This information is encoded in program and is difficult to change after it has been programmed.

SQL, on the other hand, is fashioned so that the programmer can specify what data is needed, and not how to retrieve it. SQL is coded without embedded data-navigational instructions. DB2 analyzes the SQL and formulates data-navigational instructions "behind the scenes." These data-navigational instructions are called access paths. By having the DBMS determine the optimal access path to the data, a heavy burden is removed from the programmer. In addition, the database can have a better understanding of the state of the data it stores, and thereby can produce a more efficient and dynamic access path to the data. The result is that SQL, used properly, can provide for quicker application development.

Another feature of SQL is that it is not merely a query language. The same language used to query data is used also to define data structures, control access to the data, and insert, modify, and delete occurrences of the data. This consolidation of functions into a single language eases communication between different types of users. DBAs, systems programmers, application programmers, systems analysts, and end users all speak a common language: SQL. When all the participants in a project are speaking the same language, a synergy is created that can reduce overall system-development time.

Arguably, though, the single most important feature of SQL that has solidified its success is its capability to retrieve data easily using English-like syntax. It is much easier to understand the following than it is to understand pages and pages of program source code.

   SELECT   LASTNAME
   FROM     EMP
   WHERE   EMPNO = '000010';

Think about it; when accessing data from a file the programmer would have to code instructions to open the file, start a loop, read a record, check to see if the EMPNO field equals the proper value, check for end of file, go back to the beginning of the loop, and so on.

SQL is, by nature, quite flexible. It uses a free-form structure that gives the user the ability to develop SQL statements in a way best suited to the given user. Each SQL request is parsed by the DBMS before execution to check for proper syntax and to optimize the request. Therefore, SQL statements do not need to start in any given column and can be strung together on one line or broken apart on several lines. For example, the following SQL statement is equivalent to the previously listed SQL statement:

   SELECT LASTNAME FROM EMP WHERE EMPNO = '000010';

Another flexible feature of SQL is that a single request can be formulated in a number of different and functionally equivalent ways. One example of this SQL capability is that it can join tables or nest queries. A nested query always can be converted to an equivalent join. Other examples of this flexibility can be seen in the vast array of functions and predicates. Examples of features with equivalent functionality are:

  • BETWEEN versus <= / >=
  • IN versus a series of predicates tied together with AND
  • INNER JOIN versus tables strung together in the FROM clause separated by commas
  • OUTER JOIN versus a simple SELECT, with a UNION, and a correlated subselect
  • CASE expressions versus complex UNION ALL statements


This flexibility exhibited by SQL is not always desirable as different but equivalent SQL formulations can result in extremely differing performance. The ramifications of this flexibility are discussed later in this paper with guidelines for developing efficient SQL.

As mentioned, SQL specifies what data to retrieve or manipulate, but does not specify how you accomplish these tasks. This keeps SQL intrinsically simple. If you can remember the set-at-a-time orientation of a relational database, you will begin to grasp the essence and nature of SQL. A single SQL statement can act upon multiple rows. The capability to act on a set of data coupled with the lack of need for establishing how to retrieve and manipulate data defines SQL as a non-procedural language.

Because SQL is a non-procedural language a single statement can take the place of a series of procedures. Again, this is possible because SQL uses set-level processing and DB2 optimizes the query to determine the data-navigation logic. Sometimes one or two SQL statements can accomplish tasks that otherwise would require entire procedural programs to do.

The Optimizer

The optimizer is the heart and soul of DB2. It analyzes SQL statements and determines the most efficient access path available for satisfying each statement. It accomplishes this by parsing the SQL statement to determine which tables and columns must be accessed. It then queries system information and statistics stored in the DB2 system catalog to determine the best method of accomplishing the tasks necessary to satisfy the SQL request.

The optimizer is equivalent in function to an expert system. An expert system is a set of standard rules that when combined with situational data can return an expert opinion. For example, a medical expert system takes the set of rules determining which medication is useful for which illness, combines it with data describing the symptoms of ailments, and applies that knowledge base to a list of input symptoms. The DB2 optimizer renders expert opinions on data retrieval methods based on the situational data housed in DB2’s system catalog and a query input in SQL format.

The notion of optimizing data access in the DBMS is one of the most powerful capabilities of DB2. Remember, access to DB2 data is achieved by telling DB2 what to retrieve, not how to retrieve it. Regardless of how the data is physically stored and manipulated, DB2 and SQL can still access that data. This separation of access criteria from physical storage characteristics is called physical data independence. DB2's optimizer is the component that accomplishes this physical data independence.

If indexes are removed, DB2 can still access the data (albeit less efficiently). If a column is added to the table being accessed, the data can still be manipulated by DB2 without changing the program code. This is all possible because the physical access paths to DB2 data are not coded by programmers in application programs, but are generated by DB2.

Compare this with non-DBMS systems in which the programmer must know the physical structure of the data. If there is an index, the programmer must write appropriate code so that the index is used. If the index is removed, the program will not work unless changes are made. Not so with DB2 and SQL. All this flexibility is attributable to DB2's capability to optimize data manipulation requests automatically.

The optimizer performs complex calculations based on a host of information. To simplify the functionality of the optimizer, you can picture it as performing a four-step process:

  1. Receive and verify the syntax of the SQL statement.
  2. Analyze the environment and optimize the method of satisfying the SQL statement.
  3. Create machine-readable instructions to execute the optimized SQL.
  4. Execute the instructions or store them for future execution.

The second step of this process is the most intriguing. How does the optimizer decide how to execute the vast array of SQL statements that can be sent its way?

The optimizer has many types of strategies for optimizing SQL. How does it choose which of these strategies to use in the optimized access paths? IBM does not publish the actual, in-depth details of how the optimizer determines the best access path, but the optimizer is a cost-based optimizer. This means that the optimizer will always attempt to formulate an access path for each query that reduces overall cost. To accomplish this, the DB2 optimizer applies query cost formulas that evaluate and weigh four factors for each potential access path: the CPU cost, the I/O cost, statistical information in the DB2 system catalog, and the actual SQL statement.

Summary

In this article we have learned the basics of SQL and optimization. But this is just the tip of the iceberg. For now, content yourself with the knowledge that you have embarked on the path of understanding DB2 SQL. Next month we will delve into some high-level DB2 SQL coding tips, techniques, and guidelines to help assure optimal performance.