Hello, you are not logged in.  Login or sign up
Community >> Blogs
Search Toad World Search

Blogs
Toad and Database Commentaries

 Toad World blogs are a mix of insightful how-tos from Quest experts as well as their commentary on experiences with new database technologies.

Do you have a topic that you'd like discussed?  We'd love to hear from you.  Send us your idea for a blog topic.

Normalization in Non-Gobbledygook Language
 
Location: Blogs Bert Scalzo's Blog    
 Bert Friday, January 12, 2007 8:45 AM
Probably one of the most often discussed and hotly debated topics in both data modeling and database design is that of normalization. Many database designers, architects, DBAs and senior developers have differing positions and/or beliefs on the topic. However quite unfortunately, they often are not communicating with optimal effectiveness due to some fundamental differences in terminology and understanding. The simple solution is to take five minutes to review normalization basics – and thus get everyone on the same page. That’s what this month’s blog entry strives to do.

However before you proceed, always ask yourself what kind of database and application you’re building – because many important design issues vary widely depending on the target system’s nature shown in the chart below.

 
OLTP
ODS
OLAP
DM/DW
Business Focus
Operational
Operational Tactical
Tactical
Tactical
Strategic
End User Tools
Client Server
Web
Client Server
Web
Client Server
Client Server
Web
DB Technology
Relational
Relational
Cubic
Relational
Trans Count
Large
Medium
Small
Small
Trans Size
Small
Medium
Medium
Large
Trans Time
Short
Medium
Long
Long
Size in Gigs
10 – 200
50 – 400
50 – 400
400 - 4000
Normalization
3NF
3NF
N/A
0NF
Data Modeling
Traditional ER
Traditional ER
N/A
Dimensional

For the purposes of this discussion, let’s assume that you’re working on traditional OLTP database and application. Thus you’re hoping to achieve third normal form. Let’s review what that means – and specifically, without using fancy mathematical definitions and/or terms that only PhD’s might understand. Let’s make normalization both easy and fun J

First Normal Form

It’s these last two bullet items where we’ll focus (i.e. for now, we’ll just agree that our tables should have primary keys as part of good relational design). We’re looking for single atomic values that depend upon the key – hence the first item in our little jingle.

So here’s an entity to review for adherence to first normal form:

There are four fundamental requirements for an entity or table table to meet first normal form – all of which must hold true:

·          The entity or table must have a defined and active primary key
·          The primary key is the minimal set of columns that uniquely identify a record
·          Each attribute or column in the table contains a single value (i.e. atomicity)
·          Eliminate repeating groups (i.e. no arrays)

Remember the old “Perry Mason” television show? (Maybe I should have said Matlock so as not to date myself) When a witness takes the stand – the clerk asks “Do you swear to tell the truth, the whole truth, and nothing but the truth – so help you God?” Well, we can utilize that simple little jingle to both define and remember normalization. Our catch phrase will be “… the key, the whole key, and nothing but the key – so help me Codd”. I’ll assume everyone knows who Dr. Codd was – the father of modern relational theory.

 
 
So how does CUSTOMER measure up? There is a primary key, and it’s a single numerical column – can’t get much shorter than that. And all of the columns have simple data types that appear to contain single values (i.e. no complex data types like varrays, nested tables or XML Types – where you’d have to ask some additional questions). So at first glance you might accept CUSTOMER as being in 1NF – but you very well could be wrong L

I actually testified as an expert witness in a court case on this first issue. See the LONG field – what and how is that being used for? After talking to the developers and looking at some code, assume that we find out that all the customer’s orders are collected here as the order number followed by a semi-colon. That’s an array – or a collection of values. That means we’re not in 1NF. And more importantly – we have a real world performance issue.

How would you write the query to join CUSTOMER and ORDER? Since SQL can not handle this – you’d have to programmatically process the join. So instead of the server doing the work on potentially large tables, the client code must process the join logic. Thus you have two bottlenecks – the client CPU and the network traffic to send all the records to the client to examine. Ouch!

OK – that example is a bit extreme. The real problem is with ADDRESS_LINE. In this case we have modeling how an address looks on an envelope rather than its constituent parts. Now in the case of an address that’s probably reasonable – but it sets a dangerous precedent. Look at the example below:

The ORDER contains a bunch of dates. The idea is to track each step of processing the order from start to finish. So every time we find a new “work in progress” step that we want to track, we need to alter the ORDER table and add an optional column. Yuck L

A more elegant solution that removes this hidden repeating group is to create a separate entity for work order tracking as shown here:

Now we can add new new “work in progress” steps by merely adding a row of data to the ORDER_STAGES table. In fact, we can even build a maintenance screen to support this so that end-users can easily make such business oriented modifications without sending this back to information systems. In other words, the resulting system is more flexible.

Second Form

There are two fundamental requirements for an entity or table table to meet second normal form – all of which must hold true:

  • The entity or table must be in first normal form
  •  All attributes or columns must depend upon the whole key
It’s the last bullet item where we’ll focus. If the primary key is constructed upon a single attribute or column, then we are already in 2NF. But if the primary key is composed of multiple attributes or columns (i.e. a concatenated key), then we are looking for all the attributes or columns to depend upon the whole key – hence the second item in our little jingle.
 
So here’s an entity to review for adherence to second normal form:

The test is really simple – does each attribute or column depend upon the whole key? First, does PROD_COST depend on both PROD_NAME and VENDOR_NAME? It might be arguable that this holds. But second, does FACTORY_LOCATION depend upon anything other than VENDOR_NAME? Since the answer is most likely not, then we separate this information into its own entity as shown here:

The only major problem that I see more often than not is where people break a cardinal relational database design rule, and has a single attribute or column contain multiple pieces of information. For example, they might design an attribute or column called PRODUCT_KEY which is a VARCHAR2(20) column where the first 10 characters are the product’s name and the second 10 are the vendor’s name. Thus in effect they have a concatenated key that violates 2NF. My point is that you cannot effectively normalize when attributes or columns don’t represent a single concept – so please don’t do it.

Third Normal Form
 
There are two fundamental requirements for an entity or table table to meet third normal form – all of which must hold true:
  • The entity or table must be in second normal form (and thus also in 1NF)
  • All non-key attributes or columns must depend on nothing but the key
It’s the last bullet item where we’ll focus. Does every non-key attribute or column depend upon nothing but the key – hence the third item in our little jingle.
 
So here’s an entity to review for adherence to third normal form:
 

 

Again the test is quite simple – does each non-key attribute or column depend only on PROD_NUMBER? The last two attributes or columns very clearly depend on just the PROD_CATEGORY, so it violates 3NF. Once again the solution is to simply separate this related information into its own entity as shown here:

 

 

Higher Normal Forms

Most shops will be well served if they can regularly identify and correct for up to 3NF. The primary question of whether to normalize further really depends on whom you ask. The database designers and data architects might like to normalize further, but then they don’t have to write the more complex code to manage such highly structured data. There is a clear tradeoff between code readability and maintainability when you go much higher than 3NF. But it’s my experience that most developers’ skills and patience rapidly erode beyond 3NF. We live in a world of do more faster and with less resources. So let’s not design a perfect database doomed to fail based upon academic or theoretical practice that sacrifices coding efficiency to save cheap disk space. Let’s choose a happy medium that’s effective and yet allows us to remain efficient in the implementation. And you heard that from a person with his PhD – so I guess 20 years in the trenches has forced me to practice what I preach J

Copyright ©2007 Quest Software Inc.
Permalink |  Trackback

Comments (5)   Add Comment
By JCFlack on Tuesday, January 16, 2007 6:18 AM
I appreciate your normalizing the dates into the Order Tracking table. I have a system where I didn't do that, and I've regretted it ever since.

By Bert on Wednesday, January 17, 2007 2:12 PM
Thanks for the feedback - and the confirmation. I'm hoping to make a few little points like this to aid more people who might benefit from such tips. Thanks once again :)

By hillbillyToad on Thursday, January 18, 2007 1:18 PM
Oh Bert, you've brought back some killer nightmares from my Relational Database Theory college course.

Does Toad have any tools that help identify normalization best practices?


By Bert on Thursday, January 18, 2007 2:52 PM
Well sort of. The database health check and code xpert come to mind as tools to automate locating and fixing various database configuration and SQL coding issues. Plus of course the Toad Data Modeler has its own model verification reports to check up on the design. But we're definitely adding more features along these lines all the time - so keep an eye open for even more with Toad Data Modeler 3.0 and Toad 9.1 :)

By Norm on Friday, January 26, 2007 1:52 AM
Nice one Bert - you really need to get your book published !

Cheers,
Norm. [teamT]


Comment:
Add Comment   Cancel 
Search Blog Entries