Previous installment: Shakespeare’s advice for database upgrades
First installment: DON’T PANIC

“Heigh-ho, sing heigh-ho, unto the green holly.
Most friendship is feigning, most loving mere folly.
Then heigh-ho, the holly.
This life is most jolly.”

We can’t help attaching meaning to optimizer statistics. After all, they represent our data, don’t they? So we refresh them as frequently as possible and use the largest sample sizes that we can use. Recently, for the first time in my life, I encountered a group of DBAs who understood that statistics do not have any intrinsic meaning; that they are nothing more than a collection of numbers that influence the generation of query plans. In the case of these DBAs, they found that the strategy that worked best for their applications was to treat the optimizer statistics just as they would treat program code. Statistics were checked into the source code repository just as program code was checked into the source code repository. In other words, they did not regularly refresh optimizer statistics like the rest of the world. Even so, the query plans were good enough. But, most importantly, they were blessed with plan stability. In fact, plan stability was their primary goal.

Perhaps I give them too much credit. Perhaps they had not actually concluded that statistics do not have any intrinsic meaning. Perhaps they were just doing what they had found worked best for their applications in practice. I was  reminded of my favorite quotes about statistics. But never in my wildest dreams did I imagine that one day I would meet a group that had adopted these suggestions to the extent that this group had done.

  • “It astonishes me how many shops prohibit any un-approved production changes and get re-analyze schema stats weekly, acting surprised when things change!”—Don Burleson in reply to a question posed by Mogens Nørgaard to the Oracle-L mailing list.
  • “I am constantly amazed how nonchalantly most shops schedule daily, weekly, or whatever analyze jobs even if they batten down the hatches against changes to the application. Most of the time the changed statistics do not cause a change in access plans (which immediately begs the question why do it then), but every so often the changed statistics cross a threshold to make a different plan appear to be better. It may be better, or it may turn out to be horrible. My point is: shouldn’t that be tested first?”—Wolfgang Breitling in reply to the same question.
  • “Oh, and by the way, could you please stop gathering statistics constantly? I don’t know much about databases, but I do think I know the following: small tables tend to stay small, large tables tend to stay large, unique indexes have a tendency to stay unique, and non-unique indexes often stay non-unique [emphasis added].”—Dave Ensor as remembered by Mogens Norgaard (August 2007 issue of the NoCOUG Journal)
  • “Monitor the changes in execution plans and/or performance for the individual SQL statements … and perhaps as a consequence re-gather stats. That way, you’d leave stuff alone that works very well, thank you, and you’d put your efforts into exactly the things that have become worse.”—Mogens Norgaard (August 2007 issue of the NoCOUG Journal)

There is some official precedent for such a strategy. In the case of extremely volatile tables, one of the suggestions of the SQL tuning guide is to collect statistics when the tables contain some data and lock them. If this strategy works well for extremely volatile tables, might not it work for less volatile tables?

And, in the case of system statistics, most organizations use the default values even though they are hardly likely to be representative of any particular database server. That’s a legitimate strategy because, as I said before, statistics are nothing more than a collection of numbers that influence the generation of query plans.

Were these simple applications? Was this a group of junior DBAs? Were these small databases? Was this a small company? No, no, no, and no.

But surely there were some queries whose plans were not good enough? I’m sure there were. In those cases, they probably did what the rest of us do (other than refreshing statistics) when plans are not good enough; that is, they hinted the queries or rewrote them in ways that the optimizer can handle better.

You might also question how they handled columns for which the upper and lower bounds could not be predicted in advance. Well, they used column statistics only when necessary, sometimes going so far as to manually set the LOWVAL and HIVAL in SYS.HIST_HEAD$. And, they used column histograms only when necessary.

Demonstration

Here is a tiny demonstration which demonstrates that the optimizer may be able to construct reasonably efficient query plans even without a full set of statistics. The optimizer can use any information that is available to it; for example, size information and index information. After all, “small tables tend to stay small, large tables tend to stay large, unique indexes have a tendency to stay unique, and non-unique indexes often stay non-unique.”

The following simple query was tested in the HR sample schema with different sets of statistics in Oracle Database 12.1.0.2: select sum(salary) from employees where department_id = 80. The results are shown in the following table. The Consistent Gets column indicates the efficiency of the query plan; the lower the better. The final test involved a full set of statistics but did not result in the most efficient query plan.

Test

Description

Access Path

Consistent Gets

1

No table statistics

INDEX RANGE SCAN (EMP_DEPARTMENT_IX)
TABLE ACCESS BY INDEX ROWID BATCHED (EMPLOYEES)

2

2

Dynamic sampling

TABLE ACCESS FULL (EMPLOYEES)

7

3

Statistics gathered without histograms

INDEX RANGE SCAN (EMP_DEPARTMENT_IX)
TABLE ACCESS BY INDEX ROWID BATCHED (EMPLOYEES)

2

4

LOWVAL and HIVAL removed from SYS.HIST_HEAD$

INDEX RANGE SCAN (EMP_DEPARTMENT_IX)
TABLE ACCESS BY INDEX ROWID BATCHED (EMPLOYEES)

2

5

Column statistics deleted

INDEX RANGE SCAN (EMP_DEPARTMENT_IX)
TABLE ACCESS BY INDEX ROWID BATCHED (EMPLOYEES)

2

6

Histograms of maximum size

TABLE ACCESS FULL (EMPLOYEES)

7

The full script and log file are in statistics.zip (attached). Fascinating stuff.

Summary

Optimizer statistics do not have any intrinsic meaning; that they are nothing more than a collection of numbers that influence the generation of query plans. Treating statistics like source code can be a valid strategy and confer a great measure of plan stability.

Further Reading

For varied points of view, please read the panel discussion “Statistics—How and When” in the August 2007 issue of the NoCOUG Journal and the debate “Should we stop analyzing?” initiated on 30 Dec 2003 by Mogens Nørgaard in the Oracle-L mailing list. 

Previous installment: Shakespeare’s advice for database upgrades
First installment: DON’T PANIC

Copyright © 2015 Iggy Fernandez