15 May, 2010

Cardinality Estimation

The Explain Plan output for an SQL statement shows the Execution Plan that the Optimizer has determined to be the "best" to satisfy the submitted SQL statement.
The Cardinality (which I define as "number of rows retrieved for a certain step in the execution plan") estimation done by the Optimizer is critical.

When a Table has an Index that can be used to satisfy a query, how does Oracle estimate Cardinality ? *From the Table's Column Statistics*. It then uses the Index to determine if, *for the estimated number of rows*, the Index would be an "optimal" retrieval method. Remember, it doesn't start with the Index, it starts with the estimated row count. This is based on the "selectivity" of the given query predicates.

Here is a simple demonstration of how the Optimizer doesn't start with the Index statistics but, rather, with the Table and Column Statistics :

First, I create a table with only 1 row and create two indexes on it. Remember that, in 10g (my example here is on 10.2.0.4), a CREATE INDEX automatically includes "COMPUTE STATISTICS".

SQL> drop table my_test_table purge;

Table dropped.

SQL> create table my_test_table as select * from dba_objects where 1=0;

Table created.

SQL> insert into my_test_table select * from dba_objects where object_id = 501;

1 row created.

SQL> create index my_test_objid_ndx on my_test_table(object_id);

Index created.

SQL> create index my_test_owner_ndx on my_test_table(owner);

Index created.

SQL>

Obviously, the statistics on the two indexes show the presence of only 1 row in each index.

I next insert 50,725 rows of which 1,001 rows will satisfy my first test query -- on OBJECT_ID -- and 77 rows will satisfy my second test query -- on OWNER.

SQL> insert into my_test_table select * from dba_objects where object_id is not null;

50725 rows created.

SQL> alter table my_test_table modify (object_id not null);

Table altered.

SQL>
SQL> select count(*) from my_test_table where object_id between 1001 and 2001;

COUNT(*)
----------
1001

SQL> select count(*) from my_test_table where owner = 'HEMANT';

COUNT(*)
----------
77

SQL>


I next Gather Statistics on the Table, including allowing Oracle to compute Histograms on the Columns. I explicitly deny permission to update statistics on the Indexes.

================================================
Side bar : 10g defaults CASCADE to DBMS_STATS.AUTO_CASCADE rather than "TRUE" or "FALSE". This default allows Oracle to "automatically" determine if it should update statistics on the Indexes. Had I left the CASCADE parameter to default, Oracle would have updated statistics on the Indexes as well, as the (again, another 10g default "automatic", table monitoring would have shown that more than 10% of the table has been updated.
================================================


SQL> -- Table statistics are updated
SQL> exec dbms_stats.gather_table_stats('','MY_TEST_TABLE',estimate_percent=>100,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>FALSE);

PL/SQL procedure successfully completed.

SQL> -- note above : Index statistics are not updated !
SQL>
SQL> select 'Table :', num_rows, blocks from user_tables where table_name = 'MY_TEST_TABLE'
2 union
3 select 'Ind OBJID_NDX : ' , num_rows, leaf_blocks from user_indexes where table_name = 'MY_TEST_TABLE'
4 and index_name = 'MY_TEST_OBJID_NDX'
5 union
6 select 'Ind OWNER_NDX : ' , num_rows, leaf_blocks from user_indexes where table_name = 'MY_TEST_TABLE'
7 and index_name = 'MY_TEST_OWNER_NDX'
8 /

'TABLE:' NUM_ROWS BLOCKS
---------------- ---------- ----------
Ind OBJID_NDX : 1 1
Ind OWNER_NDX : 1 1
Table : 50726 748

SQL>

Index Statistics reflect only 1 row but Table Statistics reflect the full Table. Although I haven't presented Column Statistics and Histograms, I know that they have been gathered. (You could verify this for yourself if you run a similar test).

I now run my first test query.

SQL> -- Execution Plans and Expected Row Counts
SQL> explain plan for select object_id, owner, object_name from my_test_table
2 where object_id between 1001 and 2001;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4163210948

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 36000 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| MY_TEST_TABLE | 1000 | 36000 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | MY_TEST_OBJID_NDX | 1000 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID">=1001 AND "OBJECT_ID"<=2001) 14 rows selected. SQL>

Oracle has estimated 1,000 values from the Index and a resulting 1,000 values from the Table.
We know that the Index statistics show that the Index MY_TEST_OBJID_NDX has only 1 row present. Yet, how and why did the Optimizer revert with the estimate of 1,000 ?
It computed this expectation from the Column statistics.
This is the information in the Event 10053 trace :

Table Stats::
Table: MY_TEST_TABLE Alias: MY_TEST_TABLE
#Rows: 50726 #Blks: 748 AvgRowLen: 93.00
Index Stats::
Index: MY_TEST_OBJID_NDX Col#: 4
LVLS: 0 #LB: 1 #DK: 1 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
Index: MY_TEST_OWNER_NDX Col#: 1
LVLS: 0 #LB: 1 #DK: 1 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
***************************************
SINGLE TABLE ACCESS PATH
-----------------------------------------
BEGIN Single Table Cardinality Estimation
-----------------------------------------
Column (#4): OBJECT_ID(NUMBER)
AvgLen: 5.00 NDV: 50725 Nulls: 0 Density: 1.9714e-05 Min: 2 Max: 55982
Histogram: HtBal #Bkts: 250 UncompBkts: 250 EndPtVals: 251
Table: MY_TEST_TABLE Alias: MY_TEST_TABLE
Card: Original: 50726 Rounded: 1000 Computed: 999.53 Non Adjusted: 999.53
-----------------------------------------
END Single Table Cardinality Estimation
-----------------------------------------
Access Path: TableScan
Cost: 205.24 Resp: 205.24 Degree: 0
Cost_io: 204.00 Cost_cpu: 18613351
Resp_io: 204.00 Resp_cpu: 18613351
Access Path: index (RangeScan)
Index: MY_TEST_OBJID_NDX
resc_io: 2.00 resc_cpu: 14653
ix_sel: 0.019704 ix_sel_with_filters: 0.019704
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange Index: MY_TEST_OBJID_NDX
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 999.53 Bytes: 0

Thus, the "Card" estimate is from the Column Statistics. The Table has 50,726 rows, the Column has 50,725 distinct values (shown as NDV from the column statistics),ranging from 2 to 55,982. Oracle doesn't know if all the values are Integers but it computes an expected cardinality of 999.53 which is rounded up to 1,000. It also knows that these are most likely all distinct values. Therefore, the Index would also have 1,000 distinct entries !
(Note how the 10053 trace does show that the Index Statistics reflect only 1 single row, there being only 1 Leaf Block, 1 Distinct Key, 1 Leaf Block per Key and 1 Data Block per Key and a Clutering Factor of 1).

I now run my second test query :

SQL> explain plan for select object_id, owner, object_name from my_test_table
2 where owner = 'HEMANT';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3831530497

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 77 | 2772 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| MY_TEST_TABLE | 77 | 2772 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | MY_TEST_OWNER_NDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OWNER"='HEMANT')

14 rows selected.

SQL>

Here, Oracle estimated 77 rows in the table would satisfy the query predicate OWNER='HEMANT'. It also expects only one entry for 'HEMANT' in the Index -- this one entry is expected to return 77 rowids.

This is what the 10053 trace shows :

Table Stats::
Table: MY_TEST_TABLE Alias: MY_TEST_TABLE
#Rows: 50726 #Blks: 748 AvgRowLen: 93.00
Index Stats::
Index: MY_TEST_OBJID_NDX Col#: 4
LVLS: 0 #LB: 1 #DK: 1 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
Index: MY_TEST_OWNER_NDX Col#: 1
LVLS: 0 #LB: 1 #DK: 1 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
***************************************
SINGLE TABLE ACCESS PATH
-----------------------------------------
BEGIN Single Table Cardinality Estimation
-----------------------------------------
Column (#1): OWNER(VARCHAR2)
AvgLen: 6.00 NDV: 28 Nulls: 0 Density: 9.8569e-06
Histogram: Freq #Bkts: 28 UncompBkts: 50726 EndPtVals: 28
Table: MY_TEST_TABLE Alias: MY_TEST_TABLE
Card: Original: 50726 Rounded: 77 Computed: 77.00 Non Adjusted: 77.00
-----------------------------------------
END Single Table Cardinality Estimation
-----------------------------------------
Access Path: TableScan
Cost: 205.03 Resp: 205.03 Degree: 0
Cost_io: 204.00 Cost_cpu: 15476657
Resp_io: 204.00 Resp_cpu: 15476657
Access Path: index (AllEqRange)
Index: MY_TEST_OWNER_NDX
resc_io: 2.00 resc_cpu: 14653
ix_sel: 1 ix_sel_with_filters: 1
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange Index: MY_TEST_OWNER_NDX
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 77.00 Bytes: 0

The "OWNER" column has a Frequency Histogram with 28 Buckets capturing all 28 Distinct Values. Therefore, it actually has an exact count of how many rows had "OWNER"='HEMANT' at the time that Table (and Column) Statistics were gathered. Since the query is for an Equality Range Scan on the Index, (see the "Access Path: index (AllEqRange)", it knows that the Index has only 1 "HEMANT" entry.


So, remember, "selectivity" information on columns determines the expected Cardinality which Oracle then uses to evaluate an Index, if available. It doesn't start with the Index statistics. Table and Column Statistics are more important than Index Statistics.

.
.
.

No comments: