May
19
Written by:
Bert
Tuesday, May 19, 2009 5:33 PM
The last two posts have covered popular misconceptions or “urban myths” myths when measuring and performing benchmarks. That might seem like the “whole enchilada”, but there's yet one more critical aspect to successful benchmarking – preparation. I encounter numerous people who believe that they can simply assemble the necessary hardware, install the OS, install the database software, create the database, and then have at it. And usually they allocate just two to four weeks for all of this work. I'm reminded from my youth of the Boy Scout's motto – “Be prepared”. That's the single most important aspect to benchmarking – and yet often one of the most overlooked.
For example, many people will choose to run an industry standard benchmark such as the TPC-C for OLTP or the TPC-H for data warehousing. Usually they are relying on tools such as Quest Software's Benchmark Factory (BMF) to handle the bulk of the work – and that too is fine. But they don't read the TPC spec for the test they're about to perform, so when BMF asks them a question such as how big to scale the WAREHOUSE table – they're not really and truly prepared to answer intelligibly. The point is that the benchmarking tools assume a certain minimal familiarity with the benchmark and its spec. So if you're going to run an industry standard benchmark such as the TPC-C or TPC-H, you really need to read the spec. They can be found at
www.tpc.org.
OK – so now you've read the spec. But remember
last week's post about tools not being able to auto-magically build fully optimized database structures (i.e. already clustered or partitioned) since it does not know about your hardware. Well there's preparation for that too. After you've read the benchmark spec and now have some appreciation for what you're about to do, then read the full disclosure reports posted by people who've already performed and optimized these very same tests. You will find those too at
www.tpc.org. For example IBM, Dell and HP post lots of great benchmarking results. You'll find that Appendix A of any disclosure reports contains the complete SQL DDL (Data Definition Language) commands they used to achieve their stellar results. Therefore simply find a disclosure report where the person was using similar database servers and disk storage hardware as you have, and then look to their DDL to see how you might best create your database objects.
You also will see some common themes across the DDL regardless of the hardware. For example all the TPC-C disclosure reports generally CLUSTER the DISTRICT table. The reason is simple. All the thousands of concurrent OLTP transactions must fight each other to update the very few rows in this table. Remember how back before sequence number generators we used to use counters in sequence table – and how that was a huge bottleneck? Well that's essentially what the TPC-C benchmark does. The DISTRICT table contains a “counter variable” column that all the transactions for the same district must update. Hence you'll see huge row contention within this table. By merely forcing each district row or record to go into its own block, the majority of the bottleneck is eliminated. And this is but a technique most Oracle DBA's and developers know about. But you need to understand the test in order to apply your wisdom and insights.
Hence why I always tell people – look before you leap. Benchmarking requires lots of preparation and understanding to best meet your expectations. But when you're prepared, benchmarking is really not all that tough – and getting stellar results is not as hard as people might think.