Introduction to Relational Algebra and Relational Calculus

Oracle Community

Introduction to Relational Algebra and Relational Calculus

by Iggy Fernandez

Those who are in love with practice without knowledge are like the sailor who gets into a ship without rudder or compass and who never can be certain [where] he is going. Practice must always be founded on sound theory.

The Discourse on Painting by Leonardo da Vinci

Note: This article and the companion article Equivalence of Relational Algebra and Relational Calculus should be read in conjunction with the The Hitchhiker’s Guide to the EXPLAIN PLAN series of blog posts.

The secret sauce of relational database management systems is composed of what the inventor of the relational model, Dr. Edgar Codd, referred to as “relational calculus” and “relational algebra.” He was a mathematician by training so he instinctively used complex-sounding mathematical terms that normal folk don’t use a lot. Relational calculus has nothing to do with the calculus you encountered in high school and college. A relational calculus expression is an English-like non-procedural specification of the user’s query requirements; for example, “employees who have worked in all accounting job classifications.” Relational algebra on the other hand is a step-by-step procedural recipe that details how to meet the user’s requirements.

More precisely, relational algebra is a collection of operations that could be used to combine tables. Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like “selection,” “projection,” “union,” “difference,” and “join” (refer to Table 1 for the definitions). The whole reason why you and I are so interested in “EXPLAIN PLAN” is because it documents the sequence of relational algebra operations that Oracle Database used at run-time to execute any particular query; that is, it documents the query execution plan used by Oracle Database.

Operation

Definition

SQL Implementation

Selection

Form another table by extracting a subset of the rows of the table of interest using some criteria.

SELECT *
FROM [table]
WHERE [criteria]

Projection

Form another table by extracting a subset of the columns of the table of interest. Any duplicate rows that are formed as a result of the projection operation are eliminated. This requires the use of the DISTINCT clause if the column list does not include columns that uniquely identify each row. A strict theoretical approach requires that SELECT DISTINCT always be used in place of SELECT but this would cause older versions of Oracle Database to perform unnecessary sorting so it was advisable to use the DISTINCT clause only when absolutely necessary. Newer versions of Oracle Database make an effort to avoid unnecessary sorting but the old advice still stands.

SELECT DISTINCT [column list]
FROM [table]

Union

Form another table by selecting all rows from two tables of interest. If the first table has M1 rows and the second table has M2 rows, then the resulting table will have at most M1 + M2 rows. Duplicates are eliminated from the result.

SELECT *
FROM [first table]
UNION
SELECT *
FROM [second table]

Difference

Form another table by extracting only those rows from one table of interest that do not occur in a second table.

SELECT  *
FROM [first table]
MINUS
SELECT *
FROM [second table]

Join (Cartesian)

Form another table by concatenating records from two tables of interest. If the first table has M1 rows and the second table has M2 rows, then the resulting table will have M1 *  M2 rows, and, if the first table has N1 columns and the second table has N2 columns, then the resulting table will have N1 + N2 columns.

SELECT *
FROM [first table] CROSS JOIN [second table]

Table 1 - Five relational algebra operations

We need an example. Let’s use the HR sample schema that comes with Oracle Database. Let’s use the above five operations to answer the question: “Which employees have worked in all accounting job classifications; that is, those for which the job_id starts with the characters AC.” The current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. Here is one of the ways in which this requirement can be expressed in SQL:

SELECT employee_id FROM employees
MINUS
SELECT employee_id
FROM
  (SELECT employees.employee_id,
    jobs.job_id
  FROM employees
  CROSS JOIN jobs
  WHERE jobs.job_id LIKE 'AC%'
  MINUS
    ( SELECT employee_id, job_id FROM job_history
    UNION
    SELECT employee_id, job_id FROM employees
    )
  )

It’s not very obvious but the above SQL statement specifies a total of eleven relational algebra operations: one selection operation, six projection operations, one union operation, two difference operations, and one join operation. This is clear in the following version that uses “common table expressions” to express the requirement. Note that we do not need to use the DISTINCT clause in any of the projection operations.

WITH
  -- Step 1: Projection
  all_employees_1 AS
  ( SELECT employee_id FROM employees
  ),
  -- Step 2: Projection
  all_employees_2 AS
  ( SELECT employee_id FROM employees
  ),
  -- Step 3: Projection
  all_jobs AS
  ( SELECT job_id FROM jobs
  ),
  -- Step 4: Selection
  selected_jobs AS
  ( SELECT * FROM all_jobs WHERE job_id LIKE 'AC%'
  ),
  -- Step 5: Join
  selected_pairings AS
  ( SELECT * FROM all_employees_2 CROSS JOIN selected_jobs
  ),
  -- Step 6: Projection
  current_job_titles AS
  ( SELECT employee_id, job_id FROM employees
  ),
  -- Step 7: Projection
  previous_job_titles AS
  ( SELECT employee_id, job_id FROM job_history
  ),
  -- Step 8: Union
  complete_job_history AS
  ( SELECT * FROM current_job_titles
  UNION
  SELECT * FROM previous_job_titles
  ),
  -- Step 9: Difference
  nonexistent_pairings AS
  ( SELECT * FROM selected_pairings
  MINUS
  SELECT * FROM complete_job_history
  ),
  -- Step 10: Projection
  ineligible_employees AS
  ( SELECT employee_id FROM nonexistent_pairings
  )
-- Step 11: Difference
SELECT * FROM all_employees_1
MINUS
SELECT * FROM ineligible_employees

The sequence of subqueries in the above SQL statement hopefully doesn’t need much explanation. An employee_id that occurs in a non-existent pairing of an employee_id with a job_id is an ineligible employee. Eliminating all such ineligible employees from the list of all employees therefore yields the desired result; that is, employees who have worked in all accounting job classifications.

The five relational operations in Table 1 are not only sufficient—as proved by Dr. Codd—to implement any requirement expressed in relational calculus but also “primitive,” meaning that none of them can be eliminated without compromising sufficiency. In other words, none of them is a combination of the others. Composite relational operations can be devised by combining these five operations; examples include inner join, intersection, outer join, semi-join, anti-join, and division. Inner join is obviously a combination of cartesian join and selection. Intersection can also be constructed using the operations in Table 1 as follows: A INTERSECTION B = A MINUS (A MINUS B). The “Venn diagram” diagram below makes it clear. I leave the rest as a research exercise for you.

When writing SQL, we don’t pay any attention to relational algebra and that’s by design. Codd was of the opinion that “Requesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language.” (Relational Completeness of Data Base Sublanguages). With the exception of the union operation, the original version of SQL was based on relational calculus though, over time, other elements of relational algebra like difference (minus), intersection, and outer join crept in. The following SQL statement is the more traditional way of stating the previous query. Note the use of the EXISTS keyword which is the hallmark of relational calculus. This version can be paraphrased as “find employees such that there does not exist an accounting job classification which these employees have not held.”

SELECT employee_id
FROM employees e
WHERE NOT EXISTS
  (SELECT job_id
  FROM jobs j
  WHERE job_id LIKE 'AC%'
  AND NOT EXISTS
    (SELECT *
    FROM
      ( SELECT employee_id, job_id FROM job_history
      UNION
      SELECT employee_id, job_id FROM employees
      )
    WHERE employee_id = e.employee_id
    AND job_id        = j.job_id
    )
  )

As I said before, the whole reason why you and I are so interested in “EXPLAIN PLAN” is because it documents the sequence of relational algebra operations that Oracle Database used at run-time to execute any particular query; that is, it documents the query execution plan used by Oracle Database. Here is the query execution plan that I got while testing the above SQL statement. Follow my blog and wiki postings if you want to learn how to read these plans.

Plan hash value: 842709185

--------------------------------------------------------------------
| Id  | Operation                        | Name                    |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                         |
|   1 |  INDEX FULL SCAN                 | EMP_EMP_ID_PK           |
|   2 |   INDEX RANGE SCAN               | JOB_ID_PK               |
|   3 |    VIEW                          |                         |
|   4 |     SORT UNIQUE                  |                         |
|   5 |      UNION-ALL                   |                         |
|   6 |       TABLE ACCESS BY INDEX ROWID| JOB_HISTORY             |
|   7 |        INDEX RANGE SCAN          | JHIST_EMP_ID_ST_DATE_PK |
|   8 |       TABLE ACCESS BY INDEX ROWID| EMPLOYEES               |
|   9 |        INDEX UNIQUE SCAN         | EMP_EMP_ID_PK           |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "JOBS" "SYS_ALIAS_3" WHERE
              "JOB_ID" LIKE 'AC%' AND  NOT EXISTS (SELECT 0 FROM  ( (SELECT
              "EMPLOYEE_ID" "EMPLOYEE_ID","JOB_ID" "JOB_ID" FROM "JOB_HISTORY"
              "JOB_HISTORY" WHERE "EMPLOYEE_ID"=:B1 AND "JOB_ID"=:B2)UNION (SELECT
              "EMPLOYEE_ID" "EMPLOYEE_ID","JOB_ID" "JOB_ID" FROM "EMPLOYEES"
              "EMPLOYEES" WHERE "EMPLOYEE_ID"=:B3 AND "JOB_ID"=:B4))
              "from$_subquery$_003")))
   2 - access("JOB_ID" LIKE 'AC%')
       filter("JOB_ID" LIKE 'AC%' AND  NOT EXISTS (SELECT 0 FROM  (
              (SELECT "EMPLOYEE_ID" "EMPLOYEE_ID","JOB_ID" "JOB_ID" FROM
              "JOB_HISTORY" "JOB_HISTORY" WHERE "EMPLOYEE_ID"=:B1 AND
              "JOB_ID"=:B2)UNION (SELECT "EMPLOYEE_ID" "EMPLOYEE_ID","JOB_ID"
              "JOB_ID" FROM "EMPLOYEES" "EMPLOYEES" WHERE "EMPLOYEE_ID"=:B3 AND
              "JOB_ID"=:B4)) "from$_subquery$_003"))
   6 - filter("JOB_ID"=:B1)
   7 - access("EMPLOYEE_ID"=:B1)
   8 - filter("JOB_ID"=:B1)
   9 - access("EMPLOYEE_ID"=:B1)

Also read Equivalence of Relational Algebra and Relational Calculus

Iggy Fernandez

1600 2 /
Follow / 7.16.2014 at 4:47pm

Great article!  (A very nice and clear refresher course, even if the reader happens to have encountered all this some time earlier in their career.)

I found a minor nit: shouldn't the implementation of "projection" in Table 1 be "SELECT DISTINCT"?  Because without DISTINCT, the resulting rowset can contain duplicates if the columns selected do not fully include the PK.

Follow / 8.1.2014 at 8:42pm

Thank you for catching the error. I have added the following caution to the description of the Projection operation: “This requires the use of the DISTINCT clause if the column list does not include columns that uniquely identify each row. A strict theoretical approach requires that SELECT DISTINCT always be used in place of SELECT but this would cause older versions of Oracle Database to perform unnecessary sorting so it was advisable to use the DISTINCT clause only when absolutely necessary. Newer versions of Oracle Database make an effort to avoid unnecessary sorting but the old advice still stands.”