If you were to think of creating a solution using a database as a book then database and specifically table design has to be the prologue. You know you should read it, but many people just skip it and crack on with the book.
The problem with not reading the prologue is that you could find yourself in a horror story, because let’s not forget – you’re not supposed to judge a book by its cover. The horror element about table design is that most people don’t think about the overhead of using the wrong data types or in fact aware of what data types are available to them and the intricacies of those choices. A developer may think that the only overhead is one of storage. We’re often told that storage is cheap so that’s OK isn’t it. We’ll the main point of raising this topic is to tell you that it’s not!
The reason for my stance is quite simple, data that is on disk at some point had to reside in memory. The more data has to be written to disk the longer the process will take, this process is overseen by worker threads which are managed by the processor. If you’re using SQL Server 2012 your processor time is now a lot more expensive than it used to be in previous versions due to the licence change. Think that just by using better table design that you can increase the efficiency of your processors, the price difference in adding in an extra core pack is pretty expensive now. Make your FD’s day by cranking up the efficiency of what you already have.
What’s more, if this data is used frequently then it would have stayed in cache and would have been returned to one or many clients. This means extra data being sent across the network. Of course if you have slow clients it’s going to take them longer to be able to retrieve and display this data. If you can reduce this overhead you’ll have a lot more happy end users. The larger your base data, the larger your indices will be too, this means more space required to rebuild, and a larger log file. In turn this translates to larger log backups, larger full and differential backups. This means more even more disk space and possibly more tapes. This in turn could lead on to either larger fire proof safes on site or an extra cost for offsite storage.
The impact on SLA’s
If that sounds bad, let’s now talk about your SLA’s. If you have a shop that uses Database Mirroring in High Safety mode or replication you’ll be keen to keep latency down to a minimum. Just think of the performance penalty you are going to have hardening the extra data to disk in the two phase commit process of High Safety Database Mirroring or synchronous replicas in an AlwaysOn availability group. You could end up with unintentionally long responses to your users.
Larger log files can also mean a much longer time to recover during the crash recovery process, do yourself a favour and speak to your developers to make their transactions as short as possible or crash recovery time (I’m including cluster failovers here) will take a lot longer. As an example let’s take the common decision of using varchar instead of char data type. As I’m sure you’re aware what the varchar datatype does is remove the need to pad out the field with spaces that the char data type requires if it does not fully populate the field. As I mentioned before, this means that you could potentially save a lot of space and therefore valuable resources. There’s a famous quote penned by Stan Lee in Spiderman, “With great power comes great responsibility”.
You do need to be responsible when using varchar as there is an overhead with using them specifically 2 bytes per variable length attribute in the record. In my time as a DBA I have seen such strange design choices as a VARCHAR(1). This choice actually resulted in that attribute having a real length of three, making it 3 times LESS efficient than a char(1) what really annoyed me about this attribute was not that it should have been a char(1), but that it was being used as a Boolean field which meant that it should have actually been set as a bit.
Another common decision is to use a GUID’s as a clustered index. What I will say is at 16 bytes it will create a very wide clustered index, which might not seem a problem until you consider that every nonclustered index has to include the clustered key. This is why many data architects will choose a surrogate key suck as an integer based identity attribute as this is only 4 bytes, some 12 bytes smaller per row in every index on that table. Every index unless it is a filtered index will have one row in the index per row in the physical table, even if the data is not populated.
Dates are a popular way to waste space, many applications either do not need the granularity of time that datetime provides or even the time element itself. DateTime is 8 bytes. SmallDatetime is 4 bytes. Date is 3 bytes.
Let’s create a fictional table that uses some of the problems I have outlined over not unreasonable amount of 10 million records.
I’ll also do the same, but with a slightly more thoughtfully designed table
Let’s take a look at the size of these two tables:
As you can see the second table is smaller which will benefit us in a number of ways, just from making some informed choices about the data types we are going to use. The next screen shot will show what this means for SQL Server, yes I know using “SELECT *” is bad, but this is to illustrate a point and you can bet it is happening in production systems right across the world as you read this.
To end this section on a positive note, I’m going to spin what we have covered on its head. Optimizing your table design before you start allows you to:
If you’re a developer or architect hopefully this has been of some use to you and you’ll now be able to appreciate the knock on effect that table design has on the resource usage of the entire SQL Server. If you’re a DBA use resources like this to help educate your development team to make your life easier!
To view the entire Richard Douglas webcast, go to http://bit.ly/1esnaHZ