Last time, in Part 4, I reviewed setting statistics collection parameters and the use of dbms_stats.set_*_prefs. In today's post, I will review the individual collection parameters and their values in more detail.

Collection Parameters
In order to execute a statistics collection, you either use the default parameter values or explicitly supply a modified, non-default value. In the dbms_stats.set_*_prefs procedures, you identify the parameter by name (PNAME) and the value to which you wish it to be set (PVALUE). Here's the most commonly used parameters as I provided in the last post:

CASCADE
DEGREE
ESTIMATE_PERCENT
METHOD_OPT
NO_INVALIDATE
GRANULARITY
PUBLISH
INCREMENTAL
STALE_PERCENT
OPTIONS

There are a few other variants, but these are the basic parameters you'll use primarily.

CASCADE
Default value: DBMS_STATS.AUTO_CASCADE
This parameter is used to indicate if you want statistics to be collected for not just the table, but for any indexes on that table as well. The default value setting allows Oracle to decide whether or not to collect the indexes (it does so most of the time).

DEGREE
Default value: NULL
This parameter allows the collection on the specified object to occur in parallel. This can be used to decrease collection times by spreading the work out across multiple parallel sessions. Note that the default value of NULL means that Oracle will use the default degree from the object definition. Therefore, if your object has a degree of 1, then by default, it would not collect stats on that object in parallel. Typically, you must set this value explicitly to get the parallel execution level you want. You may also use the DBMS_STATS.AUTO_DEGREE constant to allow Oracle to automatically select the degree based on parallel-related instance parameters, the number of CPUs you have available, and the size of the object. This sounds like a good idea, but be careful as very large numbers of parallel sessions may be created when Oracle is left to its own devices.

ESTIMATE_PERCENT
Default: DBMS_STATS.AUTO_SAMPLE_SIZE
This parameter determines the percentage of rows to be used in the sample upon which the stats are created. Explicitly specifying NULL means to use 100% of the rows so be careful with using NULL since 100% collections on large objects can take a long time and consume a great deal of your database resources. The default is your best bet at least as an initial starting point and Oracle will determine a percentage range between .000001 and 100. Note that the larger the object, the smaller the sample actually has to be in order to compute a "statistically relevant" value.

METHOD_OPT
Default: 'FOR ALL COLUMNS SIZE AUTO'
This parameter determines how statistics are collected on columns, specifically whether or not histograms are created. The default value allows Oracle to determine which columns need to have histograms based on the distribution of data contained in those columns as well as the workload on those columns (i.e. how often and in what manner the column is used in SQL). Sometimes the default setting is a bit too aggressive (in my opinion) in creating histograms on columns with a large number of distinct values. Often these histograms provide little to no benefit and often serve only to take up space. I recommend clear evaluation of the histograms that get created using SIZE AUTO and setting preferences to adjust to only collect histograms on specific columns that really need them as much as you possibly can.

Other options for this parameter include SIZE REPEAT and SIZE SKEWONLY. The REPEAT option causes histograms to only be collected on columns that already have histograms. While this can save some time, it can also mean that it's possible to miss getting histograms on columns that are now skewed but weren't previously. The SKEWONLY option is similar to AUTO but it doesn't take the workload on the columns into consideration. The "bad" part with this option is that it will collect histograms on all the columns only to throw them away for any columns that turns out not to be skewed. That means it takes a good long while to collect stats...so be careful with this one.

NO_INVALIDATE
Default: DBMS_STATS.AUTO_INVALIDATE
This parameter determines whether or not Oracle will invalidate any dependent cursors related to the object being collected. So, if you have previously executed SQL and it is cached in the shared pool, Oracle will determine when to invalidate that SQL so that a new execution plan can be produced using the newly collected stats. Generally speaking, I'd prefer to have the SQL immediately invalidated. That's really the point of collecting stats anyway - to get a new, "better" execution plan. So, I'd rather have the new plan parsed right away versus waiting some indefinite time for the old SQL to age out of the cache and a new parse to be done. Therefore, I prefer to explicitly set this parameter to FALSE so that all cursors will be immediately invalidated and new plans created.

GRANULARITY
Default: AUTO
This parameter relates only to tables that have partitions with the default value allowing Oracle to determine the appropriate collection setting based on the partitioning type of the table. Basically, the parameter controls if collection will be done at either a local partition, subpartition, or global level (or a combination of any/all of these).

PUBLISH
Default: TRUE
This parameter tells Oracle to either immediately make the collected stats visible and usable (TRUE) or to hold them in pending mode (FALSE). If you do not publish the stats, you can test how they will perform by setting the parameter OPTIMIZER_USE_PENDING_STATS = TRUE for a session and running a tests from your application. You can compare the results to the current/old stats and then publish the stats when you feel comfortable they do not cause any plan regressions you wish to avoid.

INCREMENTAL
Default: FALSE
This parameter is also related specifically to partitioned objects. It determines if the stats can be maintained by only doing incremental collections on "new" partitions or if a full scan of all partitions has to be performed. There are a couple of good articles about incremental stats over at Optimizer team blog here and here.

STALE_PERCENT
Default: 10 percent
This parameter determines the percentage change in number of rows in a table that must change before the table is considered stale. This setting is used by the automatic stats collection job and whenever you set OPTIONS=>'GATHER STALE'.

OPTIONS
Default: GATHER
This parameter is used to provide additional qualification on which objects stats should be collected. The default means to gather stats on all objects while other options limit the collection as follows:
GATHER AUTO - allows Oracle to determine which objects need stats collected and how to collect them; when using this option, all other parameter settings are ignored.
GATHER STALE - checks the at *_TAB_MODIFICATIONS view to determine if objects have reached or exceeded the STALE_PERCENT limit and if so, collects stats on them.
GATHER EMPTY - only collects stats on objects that have no stats present.

You can also use the LIST AUTO, LIST STALE, and LIST EMPTY to report back which objects would be included in the corresponding GATHER option collection.

Summary
Understanding the parameters used in statistics collections will allow you to better adjust values when the defaults don't work optimally in your database. Combined with how to set specific parameter preferences (as we reviewed in Part 4), you now have the information to manage the collection parameters more adeptly.

A Look Ahead
In Part 6, I'll review how to compare statistics collected at different times to assist in performance problem diagnosis related to statistics changes.