It is impossible to over-estimate the importance of database design on the effectiveness and efficiency of application systems. The first step, of course, is to create a logical data model of the business information that must be stored in, and accessed through, the database. This is a non-trivial task, but it is not the focus of today’s blog post, which is implementing optimal and well-designed database schemata.
Oftentimes the database design does not get the attention it deserves. This can occur for numerous reasons such as:
  • Insufficient specifications and/or poor logical data modeling
  • Not enough time in the development schedule
  • Too  many changes occurring throughout the development cycle
  • Database design assigned to, or performed by novices

 

And even when the database design is being performed by experienced professionals with sufficient time and a solid logical model, it is very easy for design flaws to creep into the database. This is especially the case for larger and more complex databases required to support mission critical applications.

 Of course, the first step in constructing a physical database should be transforming the logical design using best practices. The transformation consists of the following:

  • Transforming entities into tables
  • Transforming attributes into columns
  • Transforming domains into data types and constraints
  • Transforming relationships into primary and foreign keys

 

But a simple transformation will not result in a complete and correct physical database design – it is merely the first step.  And design flaws can be introduced even during such a transformation.

 

The process of normalizing your data should be conducted during the logical design phase, but sometimes mistakes are made during the logical modeling process. In simple terms, normalization is the process of identifying the one best place where each fact belongs. A normalized design minimizes data redundancy and optimizes data structures by systematically and properly placing data elements into the appropriate groupings.

  • First normal form (1NF) eliminates repeating groups and non-atomic data from an entity.
  • Second normal form (2NF) ensures that all the attributes of each entity are dependent on the primary key.
  • Third normal form (3NF) ensures that no relationships between attributes exist within an entity.

 

Although normalization is a logical process and does not necessarily dictate physical database design it is good practice to implement normalized physical databases – especially with today’s powerful hardware and database systems. A normalized data model will ensure that each entity is well formed and that each attribute is assigned to the proper entity. And denormalized data structures should always be documented.
A database that is not normalized can be difficult to modify (because facts may be strewn about within the database) and prone to data quality issues (again because one fact may be in more than one place). Failing to achieve appropriate normalization is a significant design flaw.
Design flaws can make it difficult to access a database in an efficient manner. Something as simple as creating columns with inconsistent data type and length across tables can introduce inefficiency. For example, perhaps JOB_CODE is defined as CHAR(10) in one table and CHAR(15) in another. If these columns need to be compared or joined, additional work is required by the DBMS to make the columns comparable.  It is easy to see how a database design issue can make applications inefficient no matter how adept the developers are. And this is just one type of design flaw.
 
There are numerous other types of design flaws that can negatively impact the usability and efficiency of a database implementation (and the applications that use it). Improperly defined constraints (referential, check, uniqueness) can cause data quality problems. Improper indexing (to support constraints and other physical structures) can cause a database to operate inefficiently or even cease operating altogether. In DB2 for z/OS, failing to explicitly name a clustering index causes DB2 to default cluster sequence to the oldest index. Changing index structures, which sometimes require dropping and recreating the index, can cause the data to be ordered in a completely different physical sequence, thereby impacting performance.
Some database design flaws are more subtle. What happens if you create two incompatible check constraints? For example, consider the following: 
 
     CHECK (empno < 100)
     CHECK (empno >= 101)
 
No data can be added to the table because no number is less than 100 and greater than or equal to 101. Of course, this is an extreme example to highlight the problem. Another situation can occur if the default value is not one of the values permitted by the check constraint, for example: 
 
    emp_type  CHAR(8)  DEFAULT 'new'
       CHECK (emp_type IN ('temp',
                           'fulltime',
                           'contract')), ...
 
Cycles are another potential problem that can be created in a physical database schema. A cycle is a referential path that connects a table to itself. This can occur when multiple tables are related back to each other and it looks like a loop when diagrammed. DB2 forbids a table from being delete-connected to itself in a cycle. A table is delete-connected to another table if it is a dependent of a table specified with a CASCADE delete rule.
Nested triggers can also create problems. A trigger is fired by an INSERT, UPDATE, or DELETE operation. However, a trigger can also contain INSERT, UPDATE, or DELETE logic within itself. Therefore, a trigger is fired by a data modification, but can also cause another data modification, thereby firing yet another trigger. When a trigger contains INSERT, UPDATE, and/or DELETE logic (including MERGE statements), the trigger is said to be a nested trigger. If referential integrity is combined with triggers, additional cascading updates and/or deletes can occur.
DB2 restricts the number of nested triggers that can be executed within a single firing event. If this were not done, it could be quite possible to have triggers firing triggers ad infinitum until all of the data was removed from an entire database. DB2 limits this cascading effect to 16 levels to prevent endless looping. If more than 16 levels of nesting occur, the transaction is aborted. Such nesting may restrict certain types of data modification from happening at all because the number of nested calls will always be exceeded. So nesting levels need to be controlled and managed in the database schema to avoid problems.
 And this is by no means a comprehensive list of database design flaws that can cause real problems for application developers and end users. Getting the database design correct is imperative if you wish to have any hope of assuring application performance.