In this article you will know how the new Oracle 12c dynamic sampling works. Cases where it helps the optimizer doing good estimations together with cases where it doesn’t will be presented. Only single table selectivity and group by clause will be considered in this first part of a series of articles on Adaptive Dynamic Sampling I am planning to write. The effect of dynamic sampling on joins will be considered in part II of the series.

1. Single table estimation

This is the query I will be using all over this article


SELECT count(*)
FROM   customers
WHERE  cust_state_province = 'CA'
AND    country_id=52790;
 
select segment_name, bytes/1024/1024 MB
from user_segments
where segment_name= 'CUSTOMERS';
 
SEGMENT_NAME                 MB
-------------------- ----------
CUSTOMERS                    13
 
The two columns (cust_state_province and country_id) involved in the predicate part of the above query have histograms. Below is the execution plan of this query when executed for the first time:
SQL_ID  303kpr2tf9x3j, child number 0
-------------------------------------
 
------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1115 |   3341 |
------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA'AND "COUNTRY_ID"=52790))
 
As you can point it out there is a significant difference between actual rows and estimated ones (E-Starts*E-Rows != A-Rows).
 
If we execute the same query again, we will get the following execution plan:
 
SQL_ID  303kpr2tf9x3j, child number 1
-------------------------------------
SELECT count(*)   FROM   sh.customers   WHERE  cust_state_province =
'CA'AND    country_id=52790
 
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3341 |   3341 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA'AND "COUNTRY_ID"=52790))
Note
-----
- statistics feedback used for this statement
 

The 12c Statistics feedback features (equivalent to the 11g cardinality feedback) kicks in and did a very good estimation producing a new child cursor n°1 of certainly an optimal execution plan.
 
This 12c Statistics feedback has been possible and successful because during the first query execution, Oracle realized that the estimations it has made are not close to what it has actually generated at run time. This is why it sets the is_reoptimizable property of child cursor n°0 to ‘Y’ as shown below:
  
SQL> select child_number ,is_reoptimizable
from v$sql
where sql_id= ‘303kpr2tf9x3j’;
 
CHILD_NUMBER I
------------ -
0            Y
 
As such, subsequent query executions will benefit from the information gleaned by the first execution.
 
However, there are many real life cases where tables have no statistics at all and where dynamic statistics (dynamic sampling in 11g) kicks in. What estimations would the optimizer come up with if asked to optimize the above query against the customer table in this kind of situation?. This is what we are going to check herein after using the new 12c dynamic_sampling level 11 and executing again the initial query:
 
SQL> alter session set optimizer_dynamic_sampling =11;
 
SQL_ID  fvdd4sy2zrdkq, child number 2
-------------------------------------
SELECT count(*)   FROM   customers   WHERE  cust_state_province =
'CA'AND    country_id=52790
 
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3424 |   3341 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA'AND "COUNTRY_ID"=52790))
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO).
 
As you can notice it a new child cursor n°2 has been created. In this case the optimizer estimations are quite good but not as perfect as it was the case with dynamic statistics (cursor n° 1). Of course changing the optimizer parameter forces the CBO to compile a new execution plan using the new optimizer environment.
Think now what would have the CBO produced as estimations if the shared pool was flushed and adequate statistics were collected including histograms and extended statistics for the columns involved into the predicate part as show below:
 
BEGIN
dbms_stats.gather_table_stats
('sh'
,'customers'
,method_opt=>'for all columns size skewonly
for columns(cust_state_province,country_id)
size skewonly'
);
END;
 
alter session set statistics_level=all;
 
alter system flush shared_pool;
 
SELECT count(*)
FROM   customers
WHERE  cust_state_province = 'CA'
AND    country_id=52790;
 
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
 
SQL_ID  303kpr2tf9x3j, child number 0
-------------------------------------
 
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3341 |   3341 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA'AND "COUNTRY_ID"=52790))
 
The estimations done by the CBO are in this case perfect (E-Rows=A-Rows). This is due to the help of the extended statistics collected on cust_state_province and country_id columns (see above in the call to dbms_stats package)
Let's now enable automatic dynamic sampling (ADS) and see what will be the CBO reaction:
alter system flush shared_pool;
alter session set optimizer_dynamic_sampling=11;
 
SELECT count(*)
FROM   customers
WHERE  cust_state_province = 'CA'
AND    country_id=52790;
 
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
 
SQL_ID  303kpr2tf9x3j, child number 0
-------------------------------------
 
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3424 |   3341 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("CUST_STATE_PROVINCE"='CA'AND "COUNTRY_ID"=52790))
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
 
One remark has to be emphasized here: despite the presence of adequate statistics when we instructed the CBO to use dynamic sampling it obeyed the instruction and has neglected the presence of adequate statistics. The Optimizer is probably thinking that with ADS activated it has more power and enough tools to do very good estimations. In my opinion, I think, however, that the optimizer should detect the presence of the column group statistics and should use them instead of using the ADS. This image is resembling to the working mechanism of SQL hints; when the Optimizer is hinted to follow a certain path if this path is valid the Optimizer will follow it whatever the cost of this path will be. 
Before digging in more details about this situation where both adequate statistics and dynamic sampling at level 11 are present, let me explain from where the optimizer got this 3424cardinality?
As you know in 12c the CBO trace file (10053) does not contain enough information about dynamic sampling. Here what this corresponding trace file contains:

SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for CUSTOMERS[CUSTOMERS]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
*** 2014-09-22 02:47:11.625
** Performing dynamic sampling initial checks. **
** Not using old style dynamic sampling since ADS is enabled.
Column (#11):
NewDensity:0.000144, OldDensity:0.000009 BktCnt:55500.000000, PopBktCnt:55500.000000, PopValCnt:145, NDV:145
Column (#11): CUST_STATE_PROVINCE(VARCHAR2)
AvgLen: 11 NDV: 145 Nulls: 0 Density: 0.000144
Histogram: Freq  #Bkts: 145  UncompBkts: 55500  EndPtVals: 145  ActualVal: yes
Column (#13):
NewDensity:0.000676, OldDensity:0.000009 BktCnt:55500.000000, PopBktCnt:55500.000000, PopValCnt:19, NDV:19
Column (#13): COUNTRY_ID(NUMBER)
AvgLen: 5 NDV: 19 Nulls: 0 Density: 0.000676 Min: 52769.000000 Max: 52791.000000
Histogram: Freq  #Bkts: 19  UncompBkts: 55500  EndPtVals: 19  ActualVal: yes
Table: CUSTOMERS  Alias: CUSTOMERS
Card: Original: 55500.000000    >>Single Tab Card adjusted from 1114.870631 to 3423.832500 due to adaptive dynamic sampling
Rounded: 3424  Computed: 3423.832500  Non Adjusted: 1114.870631
 
Note how the trace file is indicating that the adaptive dynamic sampling has been used to adjust the cardinality from 1114.870631 to 3423.832500. The following question then imposes itself: how this 3423.832500 value has been determined?
When I enabled sql trace (10046 events)  I get the following information
 
SQL ID: 8z1xwcv87jxb2
Plan Hash: 2105012006
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "CUSTOMERS")  */ 1 AS C1
FROM "SH"."CUSTOMERS"SAMPLE BLOCK(51.5796, 8) SEED(1)  "CUSTOMERS" WHERE
("CUSTOMERS"."CUST_STATE_PROVINCE"='CA') AND ("CUSTOMERS"."COUNTRY_ID"=
52790)) innerQuery
 
call     count       cpu    elapsed  disk  query    current    rows
------- ------  ------ ----------  ----- -----    -----   -------
Parse        1     0.00       0.00      0     0        0        0
Execute      1     0.00       0.00      0     0        0        0
Fetch        1     0.00       0.00      0     798      0        1
------- ------  ------ ----------  ----- ------   -----    ------
total        3     0.00       0.00      0     798      0        1
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 105     (recursive depth: 1)
 
Rows     Row Source Operation
-------  ---------------------------------------------------
1        RESULT CACHE  7t2g6717d8zkq49mdxstnxqnax (cr=798 pr=0 pw=0 time=6509 us)
1        SORT AGGREGATE (cr=798 pr=0 pw=0 time=6407 us)
1766      TABLE ACCESS SAMPLE CUSTOMERS (cr=798 pr=0 pw=0 time=6236 us cost=304
size=48244 card=1723)
 
As you may have noticed it via the apparition of the RESULT CACHE word in the above row source execution plan, the result has been stored in a cache for future executions (shared for all subsequent query executions). Oracle sampling mechanism identifies a cardinality of 1766 as shown in the same row source execution plan. We can also point out that this sampling mechanism uses another strange  argument (8) which represents the sampling group size used to sample the customers table (SAMPLE BLOCK(51.5796, 8)). This SAMPLE BLOCK (percent, sample_group_size) option is not yet documented by Oracle. It seems that when dynamic sampling is set at the new 12c level (11) then Oracle will use a sample_group_size with a value of 8. While for other dynamic sampling levels, Oracle will use a sample_group_size with a value of 1.


And, again, a new legitimate question imposes itself: where does this 51.5796 value come from?  In order to estimate the customers table cardinality, Oracle actually scan half part of this table (hence the 51,5796 value). This method will, of course, adds an extra time penalty to the optimization part of the underlying query; but subsequent executions will benefit from this sampling estimation without the parsing penalty it requires. This method is known as the adaptive sampling method. I will prepare another article about calculating (optimal) sample size, exact adaptive sampling algorithm and how oracle has implemented it.
So we can figure out that with a sampling percentage of 50% the row source cardinality found by Oracle is 1766.  And Oracle, probably by guessing based on this first finding, extrapolated that the final cardinality evaluated at a 100% sampling would lead to a cardinality of (1766*100/51,57 = 3424,47)rounded to 3424.
Bear in mind that with dynamic statistics set at a level different from 11, the Optimizer counts Actual SampleSize (ASS) together with Filtered Sample Card (FSC) and then it calculates the selectivity using:

selectivity =FSC/ASS


And the estimated cardinality via the following formula:

cardinality = num_rows*selectivity


While starting from 12c, using dynamic statistics set at a the new level 11, the CBO does not calculate the selectivity of the predicate. It, instead starts first by determining the optimalsample size then it continues by estimating a fraction (or a portion) of a subset of the whole result set. And finally it estimates the whole result set based on that fraction. 
At this stage of the investigation a new question raises up: what will happen when the size of the customers table increases?
It is naturally obvious that increasing the table size will imply a reduction of the sample size because, as always, sampling a large segment is always a time consuming operation. Will this sample size reduction imply a reduction of the cardinality estimation precision? A picture being worth a thousand words, let's explore this plausible case via a concrete example:

create table t_s as select * from dba_objects;
 
insert into t_s select * from t_s;
 
insert into t_s select * from t_s;
 
insert into t_s select * from t_s;
 
insert into t_s select * from t_s;
 
insert into t_s select * from t_s;
 
insert into t_s select * from t_s;
 
select segment_name, bytes/1024/1024 MB
from user_segments
where segment_name='T_S';
 
SEGMENT_NAME            MB
--------------- ----------
T_S                 776.00
 
exec dbms_stats.gather_table_stats('sh','t_s',
method_opt=>'for all columns size skewonly');
 
Against this simple model I will issue two different queries and get their execution plans using dynamic statistics set at level 11 as shown below:
First query
 
select count(*) from t_s where owner='SYS' and object_type='TABLE'
 
SQL_ID  ckutv7jm28ru9, child number 0
-------------------------------------
--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| T_S  |      1 |  57840 |  82048 |
--------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("OBJECT_TYPE"='TABLE' AND "OWNER"='SYS'))
 
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
 
Second query
 
select count(*) from t_s where owner='SH' and object_type='TABLE'
 
SQL_ID  2w2wwxaznd0kg, child number 0
-------------------------------------
--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| T_S  |      1 |    371 |   1600 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("OWNER"='SH' AND "OBJECT_TYPE"='TABLE'))
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
 
As you can point out from the above execution plans when I used a table(T_S) with a large segment size(776MB) Oracle has not been able to get an accurate estimations using dynamic statistics set at level 11. This is in contrast to the case where I've used a table (CUSTOMERS) with a relatively small size (13MB) where theCBO estimation was approximately correct. In the case of T_S table Oracle scans a very small part of this segment and then it guessed the whole query cardinality.
 
I want to emphasize also another important information which concerns the relationship between dynamic statistics (at level 11) and the presence of skewed data (columns with histogram) in the sampled segment. People are still wrongly thinking that the CBO, particularly the new dynamic statistics sampling method, is able to detect missing statistics (like histograms and extended statistics)  and produce a correct cardinality. This,of course, is not possible to happen using dynamic statistics set at level 11. Moreover, we saw above that even when extended statistics and adequate histograms are present, when Oracle is instructed to use dynamic statistics at level 11 it will bypass these existing dictionary statistics and use only information it can get during the execution plan compilation which obviously exclude getting histogram and extended statistics dynamically at parsing time.
 
As a matter of confirmation, If, for the above T_S table example, I disable automatic dynamic sampling and create extended statistics for owner and object_type columns, then the execution plan for (the second query) will look like:
 
SQL_ID  2w2wwxaznd0kg, child number 0
-------------------------------------
select count(*) from t_s where owner='SH' and object_type='TABLE'
 
--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| T_S  |      1 |   1600 |   1600 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("OWNER"='SH' AND "OBJECT_TYPE"='TABLE')).
 
Where cardinality estimation is perfect thanks to the column group statistics on (owner, object_type).
 
All the above observations and investigations concern dynamic statistics at level 11, what happens now if I drop the extended columns statistics and instruct the CBO to use dynamic statistics at a lower level (9 for example)?
 
BEGIN
DBMS_STATS.DROP_EXTENDED_STATS( 'sh', 't_s',
'("OWNER","OBJECT_TYPE")' );
END;
/
 
alter system flush shared_pool;
exec  DBMS_RESULT_CACHE.FLUSH;
alter session set optimizer_dynamic_sampling=9;
 
select count(*) from t_s where owner='SH' and object_type='TABLE'
 
SQL_ID  avw6r9kqfvx9r, child number 0
-------------------------------------
select count(*) from t_s where owner='SH'and object_type='TABLE'
 
--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| T_S  |      1 |   1763 |   1600 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("OWNER"='SH'AND "OBJECT_TYPE"='TABLE'))
Note
-----
- dynamic statistics used: dynamic sampling (level=9)
 
With dynamic statistics set at a lower level (9) we get a better estimation than with the dynamic statistics set at the new 12c level (11).
 
If now, re-create the extended statistics for owner and object_type columns, gathered statistics with histograms and instruct the CBO to use dynamic statistics at level 9, then I will end up with the following execution plan:
 
exec dbms_stats.gather_table_stats('sh','t_s',method_opt=>'for all columns size skewonly for columns(owner, object_type) size skewonly');
 
alter system flush shared_pool;
exec  DBMS_RESULT_CACHE.FLUSH;
alter session set optimizer_dynamic_sampling=9;
 
select count(*) from t_s where owner='SH'and object_type='TABLE'
 
SQL_ID  avw6r9kqfvx9r, child number 0
-------------------------------------
select count(*) from t_s where owner='SH'and object_type='TABLE'
Plan hash value: 2004207966
--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS FULL| T_S  |      1 |   1600 |   1600 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("OWNER"='SH'AND "OBJECT_TYPE"='TABLE'))
 
As you can see, Oracle optimizer did not used dynamic sampling at level 9 in presence of statistics even when instructed to do so. This means that when using optimizer_dynamic_sampling at a level different from 11,the CBO can detect the presence of accurate(or not representative) statistics and decide whether it is interesting to use or bypass the dynamic statistics feature. Of course this is not the case when dynamic statistics is set at the new level 11. In this case dynamic statistics supersede every other available statistics and use, unfortunately wrongly, dynamic sampling for its cardinality estimations.
 
2. Group By clause


I will be using the following query in order to investigate the behavior of dynamic statistics in presence of group by clause:
SELECT count(*)
FROM   sh.customers
group by COUNTRY_ID,CUST_STATE_PROVINCE;
 
The execution plan of this query is given by:

SQL_ID  gp0cpb1kutsr9, child number 0
-------------------------------------
 
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |    145 |
|   1 |  HASH GROUP BY     |           |      1 |   1949 |    145 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |  55500 |  55500 |
-------------------------------------------------------------------
 
The optimizer having no clues about the correlation between country_id and cust_state_province, produced a wrong estimation for the HASH GROUP BY operation n°1 above.
Let's then instruct the CBO to use the new 12c dynamic sampling level, re query again and note what will happen

alter system flush shared_pool;
exec  DBMS_RESULT_CACHE.FLUSH;
alter session set optimizer_dynamic_sampling=11;
 
SQL_ID  gp0cpb1kutsr9, child number 0
-------------------------------------
 
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |    145 |
|   1 |  HASH GROUP BY     |           |      1 |    145 |    145 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |  55500 |  55500 |
-------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
 
As you can observe it in the above execution plan,  HASH GROUP BY operation n°1is now perfectly estimated by the CBO.
From the corresponding 10046 trace file we can isolate the following interesting information:

SQL ID: 7b9ma1hfgd25c
Plan Hash: 922166714
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT COUNT(*) "COUNT(*)
" FROM "SH"."CUSTOMERS""CUSTOMERS" GROUP BY "CUSTOMERS"."COUNTRY_ID",
"CUSTOMERS"."CUST_STATE_PROVINCE") "VW_DIS_1") innerQuery
 
 
call     count    cpu  elapsed disk  query    current   rows
------- ------  -----  ------- ----- -------  -------  -----
Parse        1   0.00     0.00     0       0        0      0
Execute      1   0.00     0.00     0       0        0      0
Fetch        1   0.01     0.01     0    1522        0      1
------- ------  -----  ------- -----  ------  -------   ----
total        3   0.01     0.01     0    1522        0      1
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 105     (recursive depth: 1)
 
Rows     Row Source Operation
-------  ---------------------------------------------------
1  RESULT CACHE  6rw3j6xwtuzga8umg05k5624x2 (cr=1522 pr=0 pw=0 time=18273 us)
1   SORT AGGREGATE (cr=1522 pr=0 pw=0 time=18205 us)
145    VIEW  VM_NWVW_0 (cr=1522 pr=0 pw=0 time=18183 us cost=591 size=5847 card=1949)
145HASH GROUP BY (cr=1522 pr=0 pw=0 time=18156 us cost=591 size=31184 card=1949)
55500      TABLE ACCESS FULL CUSTOMERS (cr=1522 pr=0 pw=0 time=7690 us cost=586 size=888000 card=55500)
 

Interesting. In order to get an accurate estimation of the group by operation, Oracle  executed the query and got 145. Executing the query become a step in the plan optimization process!!!. Moreover, Oracle actually executed this query two times (you should believe me since I did not post the part of the trace file that shows this double execution: first execution with "VW_DIS_1"and the second one visible in the 10053 trace file with "VW_DIS_2"). I still do not have figured out why Oracle needs this double query execution at parse time. Generally when the  first sample is not accurate then a second sampling is done resulting into a second query execution. But in this particular case there has been nosampling (or can be consider 100% sampling). I think that in our case the second execution is useless and doesn't need to exist. 

Having said that, I want now to extend this group by testing to a larger table. For that, I created CUST table using as follows:

create table cust as select * from customers

and did several inserts (using insert into cust select * from cust) until table size reaches 760MB.

SEGMENT_NAME            MB
--------------- ----------
CUST                 760.00
 
And here below the corresponding 10046 trace file for the same group by query but this time against the new bigger cust table

SQL ID: 5qa7dunz4udqc
Plan Hash: 3471111787
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT COUNT(*) "COUNT(*)
" FROM "SH"."CUST""CUST" GROUP BY "CUST"."COUNTRY_ID",
"CUST"."CUST_STATE_PROVINCE") "VW_DIS_1") innerQuery
 
call     count    cpu elapsed    disk   query current   rows
------- ------  ----- ------- ------- ------- -------  -----
Parse        1   0.00    0.00       0       0       0      0
Execute      1   0.00    0.00       0       0       0      0
Fetch        1   1.31    1.31   64914   72386       0      0
------- ------  -----   ----- ------- ------- -------  -----
total        3   1.31   1.31    64914   72386       0      0
 
Rows     Row Source Operation
-------  ---------------------------------------------------
0        RESULT CACHE  8j9xaaqaqzg8q419n87dbp51vu (cr=0 pr=0 pw=0 time=1 us)
0        SORT AGGREGATE (cr=0 pr=0 pw=0 time=31 us)
0        VIEW  VM_NWVW_0 (cr=0 pr=0 pw=0 time=27 us cost=591 size=5847 card=1949)
0        HASH GROUP BY (cr=0 pr=0 pw=0 time=26 us cost=591 size=31184 card=1949)
2524125  TABLE ACCESS FULL CUST (cr=72386 pr=64914 pw=0 time=849426 us cost=586 size=888000 card=55500)
 
You can easily see that this time Oracle did not fully execute the query. Optimizer considers that this method will not be efficient because it thinks that the object is quite “big”. So it decided to use the old method to estimate group by clause with which it generated the following execution plan:
 
SQL_ID  1j4cppr9wz91r, child number 0
-------------------------------------
SELECT count(*)   FROM   sh.cust  group by
COUNTRY_ID,CUST_STATE_PROVINCE
 
--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    145 |
|   1 |  HASH GROUP BY     |      |      1 |   1949 |    145 |
|   2 |   TABLE ACCESS FULL| CUST |      1 |  55500 |   3552K|
--------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
 
However the above note is wrongly indicating that dynamic statistics at level 11 has been used. This is not completely true because during parsing the CBO ignoredthis sampling method because of the huge table size which, according to the CBO, will make sampling inefficient.
Notice also that the last 10046 trace file containsfew SQL statements that tried to estimate the cardinality of CUST table and to determine its column statistics(num_distinct etc) as reproduced below:

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel  */ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "CUST")  */ 1 AS C1 FROM
"SH"."CUST" SAMPLE BLOCK(0.826182, 8) SEED(1)  "CUST") innerQuery
 
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel  */ sum(vsize(C1))/count(*) ,
COUNT(DISTINCT C2), sum(vsize(C3))/count(*) , COUNT(DISTINCT C4)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "CUST")  */
"CUST"."CUST_STATE_PROVINCE" AS C1, "CUST"."COUNTRY_ID" AS C2,
"CUST"."COUNTRY_ID" AS C3, "CUST"."CUST_STATE_PROVINCE" AS C4 FROM
"SH"."CUST" SAMPLE BLOCK(0.826182, 8) SEED(1)  "CUST") innerQuery
 
Queries that have been ignored during plan optimization

3.  Conclusion

Automatic dynamic sampling (ADS) at level 11 can be efficient and generates accurate cardinality estimations for small tables. Because the main principle of ADS at this particular level is the estimation of a fraction of the query (estimation of a subset of the query) and guess work for the whole result set. Therefore with increasing table size, the CBO will definitely start to loose its estimation accuracy. This is why,  in most of the real life systems, this feature will probably not be of a good help. Particularly, if here is a significant data skew. And what seems not very convenient is, that in contrast to dynamic statistics set at level different from 11, presence of adequate and representative statistics including histograms and extended column group statistics, are ignored when the CBO is instructed to use dynamic statistics at level 11. In such kind of situation, a different dynamic statistics level can be more accurate than the new ADS level. Finally, queries using group by clause on big tables will not use dynamic sampling at level 11 even when instructed to do so and even though the note that accompanies the corresponding execution plan will tell the contrary..