A series of posts in response to Tim Ford’s #EntryLevel Challenge.

Part I: Getting started with Oracle Database

Part II: What is a Database?

Part III: What is a Relational Database?

Part IV: What is a Database Management System?

Part V: What makes a Database Management System relational?

What Is a Relational Database?

Relational database theory was laid out by Codd in 1970 in a paper titled “A Relational Model for Data for Large Shared Data Banks.” His theory was meant as an alternative to the “programmer as navigator” paradigm that was prevalent in his day.

In pre-relational databases, records were chained together by pointers, as illustrated in the following figures. Each chain has an owner and zero or more members. For example, all the employee records in a department could be chained to the corresponding department record in the departments table. In such a scheme, each employee record points to the next and previous records in the chain as well as to the department record. To list all the employees in a department, you would first navigate to the unique department record (typically using the direct-access technique known as hashing) and then follow the chain of employee records.

This scheme was invented by Charles Bachman, who received the ACM Turing Award in 1973 for his achievement. In his Turing Award lecture, titled “The Programmer as Navigator,” Bachman enumerated seven ways in which you can navigate through such a database:

  1. Records can be retrieved sequentially.
  2. A specific record can be retrieved using its physical address if it’s available.
  3. A specific record can be retrieved using a unique key. Either a unique index or hash addressing makes this possible.
  4. Multiple records can be retrieved using a non-unique key. A non-unique index is necessary.
  5. Starting from an owner record, all the records in a chain can be retrieved.
  6. Starting from any member record in a chain, the prior or next record in the chain can be retrieved.
  7. Starting at any member record in a chain, the owner of the chain can be retrieved.

Bachman noted, “Each of these access methods is interesting in itself, and all are very useful. However, it is the synergistic usage of the entire collection which gives the programmer great and expanded powers to come and go within a large database while accessing only those records of interest in responding to inquiries and updating the database in anticipation of future inquiries.”

An example of a pre-relational database technology is so-called network database technology, one of the best examples of which was DEC/DBMS, created by Digital Equipment Corporation for the VAX/VMS and OpenVMS platforms—it still survives today as Oracle/DBMS. Yes, it’s strange but it’s true—Oracle, the maker of the world’s dominant relational database technology, also sells a pre-relational database technology. According to Oracle, Oracle/DBMS is a very powerful, reliable, and sophisticated database technology that has continued relevance and that Oracle is committed to supporting.

In Bachman’s scheme, you need to know the access paths defined in the database. In 1979, Codd made the startling statement that programmers need not and should not have to be concerned about the access paths defined in the database. The opening words of the first paper on the relational model were, “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation)” (“A Relational Model of Data for Large Shared Data Banks”).

Productivity and ease of use were the stated goals of the relational model. In “Normalized Data Base Structure: A Brief Tutorial” (1971), Codd said,

What is less understandable is the trend toward more and more complexity in the data structures with which application programmers and terminal users directly interact. Surely, in the choice of logical data structures that a system is to support, there is one consideration of absolutely paramount importance—and that is the convenience of the majority of users. … To make formatted data bases readily accessible to users (especially casual users) who have little or no training in programming we must provide the simplest possible data structures and almost natural language. … What could be a simpler, more universally needed, and more universally understood data structure than a table?

As IBM researcher Donald Chamberlin recalled later (The 1995 SQL Reunion: People, Projects, and Politics):

[Codd] gave a seminar and a lot of us went to listen to him. This was as I say a revelation for me because Codd had a bunch of queries that were fairly complicated queries and since I’d been studying CODASYL, I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff. Codd would sort of write them down as one-liners. These would be queries like, “Find the employees who earn more than their managers.” He just whacked them out and you could sort of read them, and they weren’t complicated at all, and I said, “Wow.” This was kind of a conversion experience for me, that I understood what the relational thing was about after that.

Donald Chamberlin and fellow IBM researcher Raymond Boyce went on to create the first relational query language based on Codd’s proposals and described it in a short paper titled “SEQUEL: A Structured English Query Language” (1974). The acronym SEQUEL was later shortened to SQL because SEQUEL was a trademarked name.

Codd emphasized the productivity benefits of the relational model in his acceptance speech for the 1981 Turing Award:

It is well known that the growth in demands from end users for new applications is outstripping the capability of data processing departments to implement the corresponding application programs. There are two complementary approaches to attacking this problem (and both approaches are needed): one is to put end users into direct touch with the information stored in computers; the other is to increase the productivity of data processing professionals in the development of application programs. It is less well known that a single technology, relational database management, provides a practical foundation to both approaches.

In fact, the ubiquitous data-access language SQL was originally intended for the use of non-programmers. As explained by the creators of SQL in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners. [emphasis added] It’s for this class of users that SEQUEL is intended.”

Secret Sauce

Codd’s secret sauce was relational algebra, a collection of operations that could be used to combine tables. Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like selection, projection, union, difference, and join (more precisely, Cartesian join), listed in the following table.

Operator

Definition

Selection

Form another table by extracting a subset of the rows of a table of interest using some criteria. This can be expressed in SQL as follows (the * character is a wildcard that matches all columns in the table that is being operated on)”:

select *
from [table]
where [criteria]

Projection

Form another table by extracting a subset of the columns of a table of interest. Any duplicate rows that are formed as a result of the projection operation are eliminated:

select [column list]
from [table]

Union

Form another table by selecting all rows from two tables of interest. If the first table has 10 rows and the second table has 20 rows, then the resulting table will have at most 30 rows, because duplicates are eliminated from the result:

select *
from [first table]
union
select *
from [second table]

Difference

Form another table by extracting from one table of interest only those rows that don’t occur in a second table:

select *
from [first table]
minus
select *
from [second table]

Join

select *
from [first table], [second table]

Why did Codd name this relational algebra? Codd based his theory on rigorous mathematical principles and used the esoteric mathematical term relation to denote what is loosely referred to as a table.

Worked Example

Let’s use the five operations defined in the above able to answer this question: “Which employees have worked in all accounting positions—that is, those for which the job_id starts with the characters AC?” The current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. The list of job titles is held in the jobs table.

Here is the description of the employees table:

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
EMPLOYEE_ID                               NOT NULL NUMBER(6)
FIRST_NAME                                         VARCHAR2(20)
LAST_NAME                                 NOT NULL VARCHAR2(25)
EMAIL                                     NOT NULL VARCHAR2(25)
PHONE_NUMBER                                       VARCHAR2(20)
HIRE_DATE                                 NOT NULL DATE
JOB_ID                                    NOT NULL VARCHAR2(10)
SALARY                                             NUMBER(8,2)
COMMISSION_PCT                                     NUMBER(2,2)
MANAGER_ID                                         NUMBER(6)
DEPARTMENT_ID                                      NUMBER(4)

Here is the description of the job_history table:

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
EMPLOYEE_ID                               NOT NULL NUMBER(6)
START_DATE                                NOT NULL DATE
END_DATE                                  NOT NULL DATE
JOB_ID                                    NOT NULL VARCHAR2(10)
DEPARTMENT_ID                                      NUMBER(4)

Here is the description of the jobs table:

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
JOB_ID                                    NOT NULL VARCHAR2(10)
JOB_TITLE                                 NOT NULL VARCHAR2(35)
MIN_SALARY                                         NUMBER(6)
MAX_SALARY                                         NUMBER(6)

Given these three tables, you can execute the following steps to answer the business question that has been posed. Each step builds on the previous step and a SQL query slowly takes shape. You can follow along in the SQL worksheet in SQL Developer.

1. This step uses only the employee_id column from the employees table. To do this, you need the projection operation. The employees table contains 107 rows, so this command also produces 107 rows. Only the first five rows are shown here:

select employee_id
from employees

        100
        101
        102
        103
        104

Note that certain formatting aspects of SQL statements, such as lowercase, uppercase, white space, and line feeds, are immaterial except in the case of string literals—that is, strings of characters enclosed within quote marks.

2. This step uses only the job_id column from the jobs table. To obtain this, you need the projection operation again. The jobs table contains 19 rows, so this command also produces 19 rows. Only the first five rows are shown here:

select job_id
from jobs

AC_ACCOUNT
AC_MGR
AD_ASST
AD_PRES
AD_VP

3. Remember that the result of any relational operation is always another table. You need a subset of rows from the table created by the projection operation used in step 2. To obtain this, you need the selection operation, as shown in the following SQL command and its results. * is a wildcard that matches all the columns of a table. % is a wildcard that matches any combination of characters. Note that the “table” that is operated on is actually the SQL command from step 2. The result contains only two rows:

select *
from (select job_id from jobs)
where job_id like 'AC%'

AC_ACCOUNT
AC_MGR

You can streamline this SQL command as follows. This version expresses both the projection from step 2 and the selection from step 3 using a unified syntax. Read it carefully, and make sure you understand it:

select job_id from jobs
where job_id like 'AC%'

4. You need the job title of every employee; that is, you need the job_id column from the employees table. The employees table has 107 rows, so the resulting table also has 107 rows; five of them are shown here:

select employee_id, job_id
from employees

        100 AD_PRES
        101 AD_VP
        102 AD_VP
        103 IT_PROG
        104 IT_PROG

5. Next, you need the employee_id and job_id columns from the job_history table. The jobs table contains 19 rows, so this command also produces 19 rows. Only the first five rows are shown here:

select employee_id, job_id
from job_history

        101 AC_ACCOUNT
        200 AC_ACCOUNT
        101 AC_MGR
        200 AD_ASST
        102 IT_PROG

6. Remember that the current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. The complete job history of any employee is therefore the union of the tables created in step 4 and step 5:

select employee_id, job_id
from employees
union
select employee_id, job_id
from job_history

7. You need to join the tables created in step 1 and step 3. The resulting table contains all possible pairings of the 107 emp_id values in the employees table with the two job_id values of interest. There are 214 such pairings, a few of which are shown next:

select *
from
  (select employee_id from employees),
  (select job_id from jobs where job_id like 'AC%')

        100 AC_ACCOUNT
        101 AC_ACCOUNT
        102 AC_ACCOUNT
        103 AC_ACCOUNT
        104 AC_ACCOUNT

You can streamline this SQL command as follows. This version expresses the projections from step 1 and step 2, the selection from step 3, as well as the join in the current step using a unified syntax. Read it carefully, and make sure you understand it. This pattern of combining multiple projection, join, and selection operations is the most important SQL pattern, so you should make sure you understand it. Note that you prefix the table names to the column names. Such prefixes are required whenever there are ambiguities. In this case, there is a job_id column in the employees table in addition to the jobs table:

select employees.employee_id, jobs.job_id
from employees, jobs
where jobs.job_id like 'AC%'

8. From the table created in step 7, you need to subtract the rows in the table created in step 6! To do this, you need the difference operation, the appropriate SQL keyword being minus. The resulting table contains those pairings of employee_id and job_id that are not found in the job_history table. Here is the SQL command you need. The resulting table contains exactly 211 rows, a few of which are shown:

select employees.employee_id, jobs.job_id
from employees, jobs
where jobs.job_id like 'AC%'
minus
select employee_id, job_id
from job_history

        100 AC_ACCOUNT
        100 AC_MGR
        102 AC_ACCOUNT
        102 AC_MGR
        103 AC_ACCOUNT

9. Thus far, you’ve obtained pairings of employee_id and job_id that are not found in the employee’s job history—that is, the table constructed in step 6. Any employee who participates in such a pairing is not an employee of interest; that is, any employee who participates in such a pairing isn’t an employee who has worked in all positions for which the job_id starts with the characters AC. The first column of this table therefore contains the employees in which you’re not interested. You need another projection operation:

select employee_id from
(
  select employees.employee_id, jobs.job_id
  from employees, jobs
  where jobs.job_id like 'AC%'
  minus
  select employee_id, job_id
  from job_history
)

        100
        100
        102
        102
        103

10. You’ve identified the employees who don’t satisfy your criteria. All you have to do is to eliminate them from the table created in step 1! Exactly one employee satisfies your criteria:

select employee_id
from employees
minus
select employee_id from
(
  select employees.employee_id, jobs.job_id
  from employees, jobs
  where jobs.job_id like 'AC%'
  minus
  (
    select employee_id, job_id
    from job_history
    union
    select employee_id, job_id
    from job_history
  )
)

        101

You had to string together 10 operations—5 projection operations, 1 selection operation, 1 union operation, and 2 difference operations—to produce the final answer:

The Definition At Last

We’re now ready to define the term “relational database":

A relational database is a database in which: The data is perceived by the user as tables (and nothing but tables) and the operators available to the user for (for example) retrieval are operators that derive “new” tables from “old” ones.—Chris Date, An Introduction to Database Systems, 8th ed. (Addison-Wesley, 2003)

Part I: Getting started with Oracle Database

Part II: What is a Database?

Part III: What is a Relational Database?

Part IV: What is a Database Management System?

Part V: What makes a Database Management System relational?

Excerpted from Beginning Oracle Database 12c Administration