01 December, 2010

Using V$SESSION_LONGOPS

I have noticed a number of responses on forums suggesting that V$SESSION_LONGOPS can be used to monitor long running queries and gather statistics execution. Also, a similar response to a previous blog post.

As I have pointed out in the forums posting, V$SESSION_LONGOPS reports *operations*, not Queries and Sessions. A query or a PLSQL block can consist of multiple operations. Even a Nested Loop Join can manifest as multiple operations where the driven table is scanned using FullTableScan repeatedly.
V$SESSION_LONGOPS reports each operation separately, not the whole SQL that caused the multiple operations.

In this example, below, I show how any computation of "how much of the gather statistics has been done" and a "projection" of total time based on V$SESSION_LONGOPS would be misleading.

In one session, I run this SQL from approx 06:45:09 to approx 07:02:52 :

SQL> desc STORE_LIST
Name Null? Type
------------------------------------------------------------------------ -------- -------------------------------------------------
STORE_ID NOT NULL NUMBER
COUNTRY VARCHAR2(30)
STORE_TYPE VARCHAR2(19)
REGISTRATION_DATE DATE

SQL>

SQL> exec dbms_stats.gather_table_stats(USER,'STORE_LIST',estimate_percent=>100);

SQL> select num_rows, sample_size, blocks, blocks*8192/1048576 Size_MB from user_tables where table_name = 'STORE_LIST';

where the results are :

PL/SQL procedure successfully completed.

Elapsed: 00:17:43.17

NUM_ROWS SAMPLE_SIZE BLOCKS SIZE_MB
---------- ----------- ---------- ----------
110327040 110327040 507160 3962.1875


So, I ran GATHER_TABLE_STATS on a table with 110million rows and 3,962MB in size from 06:45:09 to 07:02:52.

I had another session polling V$SESSION_LONGOPS approximately every 30seconds. The query was :

select sid, opname, target, sofar, totalwork,
units, to_char(start_time,'HH24:MI:SS') StartTime,
time_remaining, message, username
from v$session_longops
where sofar != totalwork
order by start_time
/



These are some of the results :

At 06:45:41
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 49110 507160 Blocks 06:45:09
289
Table Scan: HEMANT.STORE_LIST: 49110 out of 507160 Blocks done
HEMANT

If I had extrapolated from here, I would have expected the Gather Stats to take another 289seconds -- i.e. run till 06:49:30.

Another snapshot :

At 06:47:57
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 272042 507160 Blocks 06:45:09
144
Table Scan: HEMANT.STORE_LIST: 272042 out of 507160 Blocks done
HEMANT

This indicated that 53% (272042 of 507160 blocks) had been read and the estimated end time would be 06:50:21 now (06:47:57 plus 144seconds).

A subsequent snapshot :

At 06:49:51
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 453007 507160 Blocks 06:45:09
34
Table Scan: HEMANT.STORE_LIST: 453007 out of 507160 Blocks done
HEMANT

The expected end time was now 06:50:34 (06:49:51 plus 34seconds). (Notice that the expected end time is slowly creeping forward ... but we can live with that -- we assume that Oracle is "refining" it's estimate).

All of a sudden, something changed in the next snapshot :

At 06:50:42
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 63023 507160 Blocks 06:50:21
148
Table Scan: HEMANT.STORE_LIST: 63023 out of 507160 Blocks done
HEMANT

Why did "SOFAR" (i.e. the number of blocks read so far) suddenly shrink (or reset ?) from 453,007 to 63,023 ? The expected end time is now 06:53:10 (06:50:42 plus 148seconds).

Here's the reason :
The SQL statement that had been running since 06:45:09 was :

SQL_ID : 5urpb1azwjnwv
select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitor
ing no_substrb_pad */count(*), sum(sys_op_opnsize("STORE_ID")), count("COUNTRY"), count(distinct "COUNTRY"), sum(sys_op_opnsize(
"COUNTRY")), substrb(dump(min("COUNTRY"),16,0,32),1,120), substrb(dump(max("COUNTRY"),16,0,32),1,120), count("STORE_TYPE"), coun
t(distinct "STORE_TYPE"), sum(sys_op_opnsize("STORE_TYPE")), substrb(dump(min("STORE_TYPE"),16,0,32),1,120), substrb(dump(max("S
TORE_TYPE"),16,0,32),1,120), count("REGISTRATION_DATE"), count(distinct "REGISTRATION_DATE"), substrb(dump(min("REGISTRATION_DAT
E"),16,0,32),1,120), substrb(dump(max("REGISTRATION_DATE"),16,0,32),1,120) from "HEMANT"."STORE_LIST" t

Sometime between 06:49:50 and 06:50:42 (at 06:50:21, according to the V$SESSION_LONGOPS output), the running SQL changed to :

SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt

So, the GATHER_TABLE_STATS actually runs multiple, separate SQL statements ! In this case, V$SESSION_LONGOPS was reset by a Fresh FullTableScan initiated for the new SQL !

Further down the timeline, I see :

At 06:52:37
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 457529 507160 Blocks 06:50:21
15
Table Scan: HEMANT.STORE_LIST: 457529 out of 507160 Blocks done
HEMANT

But at 06:53:28 :
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
15523 166760 Blocks 06:52:52
341
Sort/Merge: : 15523 out of 166760 Blocks done
HEMANT


So, the Table Scan operation that began at 06:50:21 was succeeded by a Sort/Merge operation that began at 06:52:52. The estimated end time was now 06:59:09 (06:53:28 plus 341seconds).

However, monitoring the SQL for the session :

At 06:53:27 :
SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt

Again at 06:54:59 :
SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt


So, the SQL that began at 06:50:21 was still continuing at 06:54:59 but the Operation had changed at 06:52:52 and so V$SESSION_LONGOPS was showing a new operation with a new start time and a new estimated end time.

We have, so far, seen :
1. SQL has changed, resetting V$SESSION_LONGOPS
2. Operation within an SQL has changed (from Table Scan to Sort/Merge), resetting V$SESSION_LONGOPS again

Subsequently, from the snapshot at 06:56:44, the "TIME_REMAINING" is NULL :

At 06:56:44
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
167882 166760 Blocks 06:52:52

Sort/Merge: : 167882 out of 166760 Blocks done
HEMANT

At 06:57:35
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
210688 166760 Blocks 06:52:52

Sort/Merge: : 210688 out of 166760 Blocks done
HEMANT

At 06:58:57
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
255326 166760 Blocks 06:52:52

Sort/Merge: : 255326 out of 166760 Blocks done
HEMANT


You may have also noticed that "SOFAR" is actually well in excess of "TOTALWORK". Seemingly, Oracle has been doing more work than the total work for this SORT/MERGE operation ! (Why ? !)

Then, the information changes again :

At 07:00:22
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort Output
150657 162822 Blocks 06:59:27
4
Sort Output: : 150657 out of 162822 Blocks done
HEMANT

The view now shows that a Sort Output operation began at 06:59:27.
The SQL statement was still the same :

At 07:02:21
SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt


So, within this SQL, there have been 3 different Operations reported by V$SESSION_LONGOPS.


Within a single GATHER_TABLE_STATS call, we have seen 2 SQLs and at least 4 different Operations, each Operation being a fresh entry in V$SESSION_LONGOPS.
And I have not even thrown in the complication of Column Statistics being gathered by METHOD_OPT being "FOR ALL COLUMNS SIZE AUTO" being the default. Since I haven't run any queries against the table, the "SIZE AUTO" finds nothing in SYS.COL_USAGE$ and creates no Histograms :

SQL> l
1 select column_name, num_distinct, num_nulls, histogram, num_buckets from user_tab_col_statistics
2* where table_name = 'STORE_LIST'
SQL> /

COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ---------- --------------- -----------
STORE_ID 306464 0 NONE 1
COUNTRY 42 0 NONE 1
STORE_TYPE 44 0 NONE 1
REGISTRATION_DATE 5896 0 NONE 1

SQL>


Had Oracle decided to gather Histograms on the columns, there would have been many more SQLs and many more Operations from the GATHER_TABLE_STATS call. Reading V$SESSION_LONGOPS would have been very confusing as each new Operation would enter with a new STARTTIM ! And there are no Indexes to cascade gather statistics, else indexes would have cause more SQLs, more Operations, more entries V$SESSION_LONGOPS.


Therefore, it is clear that any reading of VSESSION_LONGOPS for the GATHER_TABLE_STATS on the STORE_LIST table is misleading as the (SOFAR and TOTALWORK) and (STARTTIM and TIME_REMAINING) figures are reset with each new entry.
This is the complete listing of V$SESSION_LONGOPS entries : 4 distinct operations for one GATHER_TABLE_STATS :

SQL> l
1 select to_char(sysdate,'DD-MON HH24:MI:SS') Collection_DateStamp, sid, opname, target, sofar, totalwork,
2 units, to_char(start_time,'HH24:MI:SS') StartTime,
3 time_remaining, message, username
4 from v$session_longops
5 where 1=1
6* order by start_time
SQL> /

COLLECTION_DATESTAMP SID OPNAME
------------------------ ---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
01-DEC 08:03:44 43 Table Scan
HEMANT.STORE_LIST 507160 507160 Blocks 06:45:09
0
Table Scan: HEMANT.STORE_LIST: 507160 out of 507160 Blocks done
HEMANT

01-DEC 08:03:44 43 Table Scan
HEMANT.STORE_LIST 507160 507160 Blocks 06:50:21
0
Table Scan: HEMANT.STORE_LIST: 507160 out of 507160 Blocks done
HEMANT

01-DEC 08:03:44 43 Sort/Merge
166760 166760 Blocks 06:52:52
0
Sort/Merge: : 166760 out of 166760 Blocks done
HEMANT

01-DEC 08:03:44 43 Sort Output
162822 162822 Blocks 06:59:27
0
Sort Output: : 162822 out of 162822 Blocks done
HEMANT


SQL>



The first entry that appeared in V$SESSION_LONGOPS was *not* indicative of all the operations in the GATHER_TABLE_STATS. Also, I could never be sure if all operations had completed, simply by monitoring V$SESSION_LONGOPS. I kept seeing new operations appearing. (Had Column Histograms and Index statistics been gathered, there would have been that many more operations --- Column Histograms are particularly dicey when SAMPLE_SIZE is 'AUTO' and METHOD_OPT is 'FOR ALL COLUMNS SIZE AUTO' because Oracle can keep running against the same column repeatedly, each time with a different sample size).

.
.

NOTE : Another example of misreading V$SESSION_LONGOPS for a DML (UPDATE) statement is published here.

.
.
.

Most Popular Posts - Nov 10

Blogger's "Stats" feature shows that the 3 most popular posts in the past 30 days have been :

1. AUTOEXTEND ON Next Size : 325 pageviews

.
.
.

21 November, 2010

Oracle VM Templates released

Oracle has released a number of VM Templates, for different product/technology stacks. See http://www.oracle.com/technetwork/server-storage/vm/templates-101937.html

.
.
.

Some Common Errors - 7 - "We killed the job because it was hung"

Continuing the "Common Errors" series ...

I have come across occasions when a Developer or a DBA has "killed" a job only because "it was hung". If your user or Developer points you to a job that "seems to be 'hung'", as a DBA, it is your responsibility to determine the status of the job from the database instance perspective, not from the user's perspective. Besides the GUI Enterprise Manager screens, Oracle provides numerous views that you can use to monitor the job. Learn to make use of V$SESSION, V$SESS_IO, V$SESSTAT, V$SESSION_WAIT, V$SESSION_EVENT, V$ACTIVE_SESSION_HISTORY to monitor a session.

Never kill and restart a job without at least determining (or to a reasonable degree of confidence) what caused the job to appear to be 'hung' and what action you need to take to ensure that the behaviour doesn't repeat the next time the job is re-run.
If you merely kill a job and expect it to perform better when it is restarted, without determining and fixing the causes, you are likely to incur the displeasure of the user.

As a professional, one of the attributes expected of you is "being able to diagnose a cause for a symptom".

.
.
.

15 November, 2010

"SET TIME ON" in RMAN

"SET TIME ON" in SQLPlus is a SQLPlus client command. It displays the system time from the SQLPlus client program.

If you want RMAN to display "time" you can use NLS_DATE_FORMAT as below :

[oracle@localhost ~]$ env |grep NLS
[oracle@localhost ~]$ NLS_DATE_FORMAT=DD_MON_YYYY_HH24:MI:SS;export NLS_DATE_FORMAT
[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Mon Nov 15 06:45:50 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL (DBID=1229390655)

RMAN> backup database;

Starting backup at 15_NOV_2010_06:45:55
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=54 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=47 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/addtl/oracle/oradata/orcl/users01.dbf
input datafile file number=00002 name=/home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/orcl/system01.dbf
input datafile file number=00003 name=/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
input datafile file number=00005 name=/home/oracle/app/oracle/oradata/orcl/example01.dbf
input datafile file number=00006 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1046101119510758.dbf
input datafile file number=00007 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1146416395631714.dbf
input datafile file number=00008 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1170420963682633.dbf
channel ORA_DISK_1: starting piece 1 at 15_NOV_2010_06:45:58
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00014 name=/addtl/oracle/oradata/orcl/hemant01.dbf
input datafile file number=00015 name=/addtl/oracle/oradata/orcl/hemant02.dbf
input datafile file number=00016 name=/addtl/oracle/oradata/orcl/hemant03.dbf
input datafile file number=00009 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1194425963955800.dbf
input datafile file number=00010 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1218408858999342.dbf
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1242310449730067.dbf
input datafile file number=00012 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1266412439758696.dbf
input datafile file number=00013 name=/home/oracle/app/oracle/oradata/orcl/APEX_1295922881855015.dbf
channel ORA_DISK_2: starting piece 1 at 15_NOV_2010_06:45:58
channel ORA_DISK_2: finished piece 1 at 15_NOV_2010_06:46:01
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_nnndf_TAG20101115T064557_6g2kx70g_.bkp tag=TAG20101115T064557 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_2: starting piece 1 at 15_NOV_2010_06:46:03
channel ORA_DISK_2: finished piece 1 at 15_NOV_2010_06:46:06
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_ncnnf_TAG20101115T064557_6g2kxcc6_.bkp tag=TAG20101115T064557 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_2: starting piece 1 at 15_NOV_2010_06:46:07
channel ORA_DISK_2: finished piece 1 at 15_NOV_2010_06:46:08
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_nnsnf_TAG20101115T064557_6g2kxh4q_.bkp tag=TAG20101115T064557 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:02
...



Notice how the RMAN output shows the Tmestamp in DD_MON_YYYY_HH24:MI:SS format. You could specify an alternate format as well.

[oracle@localhost ~]$ NLS_DATE_FORMAT=DD/MM/YYYY_HH24_MI_SS;export NLS_DATE_FORMAT
[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Mon Nov 15 06:50:47 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL (DBID=1229390655)

RMAN> backup datafile 1 ;

Starting backup at 15/11/2010_06_50_53
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=47 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=49 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/orcl/system01.dbf
channel ORA_DISK_1: starting piece 1 at 15/11/2010_06_50_56
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_2: starting piece 1 at 15/11/2010_06_51_00
channel ORA_DISK_2: finished piece 1 at 15/11/2010_06_51_16
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_ncnnf_TAG20101115T065055_6g2l6obt_.bkp tag=TAG20101115T065055 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:16
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_2: starting piece 1 at 15/11/2010_06_51_17
channel ORA_DISK_1: finished piece 1 at 15/11/2010_06_51_32
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_nnndf_TAG20101115T065055_6g2l6jdx_.bkp tag=TAG20101115T065055 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:36
channel ORA_DISK_2: finished piece 1 at 15/11/2010_06_51_32
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_nnsnf_TAG20101115T065055_6g2l763s_.bkp tag=TAG20101115T065055 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:15
Finished backup at 15/11/2010_06_51_32

RMAN>


Thus, I can change the format of the time display with NLS_DATE_FORMAT.

I can also display backup timestamps etc in the desired format :

[oracle@localhost ~]$ NLS_DATE_FORMAT=DD_MON_RR_HH24_MI_SS
[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Mon Nov 15 06:54:15 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL (DBID=1229390655)

RMAN> list backup of datafile 1;

using target database control file instead of recovery catalog

List of Backup Sets
===================


BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ------------------
21 Full 666.52M DISK 00:00:48 26_OCT_10_07_17_31
BP Key: 21 Status: AVAILABLE Compressed: NO Tag: TAG20101026T071642
Piece Name: /addtl/oracle/flash_recovery_area/ORCL/backupset/2010_10_26/o1_mf_nnndf_TAG20101026T071642_6dfrpcdn_.bkp
List of Datafiles in backup set 21
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------ ----
1 Full 5323127 26_OCT_10_07_16_43 /home/oracle/app/oracle/oradata/orcl/system01.dbf

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ------------------
27 Full 953.77M DISK 00:04:42 26_OCT_10_07_23_15
BP Key: 27 Status: AVAILABLE Compressed: YES Tag: TAG20101026T071833
Piece Name: /addtl/oracle/flash_recovery_area/ORCL/backupset/2010_10_26/o1_mf_nnndf_TAG20101026T071833_6dfrst78_.bkp
List of Datafiles in backup set 27
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------ ----
1 Full 5323217 26_OCT_10_07_18_33 /home/oracle/app/oracle/oradata/orcl/system01.dbf

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ------------------
34 Full 668.63M DISK 00:00:34 15_NOV_10_06_51_30
BP Key: 34 Status: AVAILABLE Compressed: NO Tag: TAG20101115T065055
Piece Name: /addtl/oracle/flash_recovery_area/ORCL/backupset/2010_11_15/o1_mf_nnndf_TAG20101115T065055_6g2l6jdx_.bkp
List of Datafiles in backup set 34
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------ ----
1 Full 5442510 15_NOV_10_06_50_56 /home/oracle/app/oracle/oradata/orcl/system01.dbf

RMAN>


This is useful when you are looking for backups within a particular time range and yo uhave had multiple backups done on the same day. The format can be modified to show Hours_Minutes_Seconds as well.
.
.
.

01 November, 2010

Most Popular Posts - Oct 10

Blogger's "Stats" feature shows that the 3 most popular posts in the past 30 days have been :

3. AUTOEXTEND ON Next Size : 198 pageviews

.
.
.

20 October, 2010

How the Optimizer can use Constraint Definitions

A simple demonstration of how properly defined Referential Integrity constraints can be used by the Optimizer.....

If two tables (say SALES and SALES_LINES) have a Parent-Child relationship but the proper R.I. constraint is not defined, it is possible (either through faulty application code that uses two separate transactions for an INSERT {or a DELETE} against the two tables OR through erroneous adhoc updates to the data) to have "dangling" child records.
For example, an INSERT into SALES errors out (for lack of space in that tablespace) but the corresponding INSERT into SALES_LINES succeeds and commits, when executed as a separate transaction.
Or an adhoc "data-fix" operation deletes rows from the SALES table without having deleted them from the SALES_LINES table first.

Having UNIQUE Indexes on the two tables will not prevent the occurrence of "parent-less children" !

When querying the SALES_LINES table, it might be necessary to validate that the corresponding SALES row exists. This could be done as a join between the two tables, explicitly included in every query against the SALES_LINES table.

Assuming that the two tables have 100,000 rows (each sale having only 1 line), with proper Unique Indexes, a query for one PROD_ID of 10 different products may execute as :


SQL> -- Query for Total Sales for PROD_ID=5
SQL> select sum(l.quantity), sum(l.quantity*l.price)
2 from sales_lines l, sales s
3 where
4 -- join code is used to validate that the sale_id is legitimate
5 l.sale_id = s.sale_id
6 and l.prod_id=5
7 and l.sale_id between 25000 and 35000
8 /

SUM(L.QUANTITY) SUM(L.QUANTITY*L.PRICE)
--------------- -----------------------
2441786 48892118.9


Execution Plan
----------------------------------------------------------
Plan hash value: 146457679

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 139 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 39 | | |
| 2 | NESTED LOOPS | | 984 | 38376 | 139 (0)| 00:00:02 |
|* 3 | TABLE ACCESS BY INDEX ROWID| SALES_LINES | 984 | 33456 | 139 (0)| 00:00:02 |
|* 4 | INDEX RANGE SCAN | SALES_LINES_UK | 10000 | | 24 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SALES_UK | 1 | 5 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


3 - filter("L"."PROD_ID"=5)
4 - access("L"."SALE_ID">=25000 AND "L"."SALE_ID"<=35000)

5 - access("L"."SALE_ID"="S"."SALE_ID")
filter("S"."SALE_ID"<=35000 AND "S"."SALE_ID">=25000)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
281 consistent gets
0 physical reads
0 redo size
523 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

However, if I were to define the Constraints as :

SQL> REM REM ######################################
SQL> REM Now add the Constraint Definitions !
SQL> alter table SALES add constraint SALES_PK primary key (sale_id);

Table altered.

SQL> alter table SALES_LINES add constraint SALES_LINES_FK foreign key (sale_id) references SALES (sale_id);

Table altered.

SQL> REM REM ######################################

and I re-run the query :

SQL> select sum(l.quantity), sum(l.quantity*l.price)
2 from sales_lines l, sales s
3 where
4 -- join code is used to validate that the sale_id is legitimate
5 l.sale_id = s.sale_id
6 and l.prod_id=5
7 and l.sale_id between 25000 and 35000
8 /

SUM(L.QUANTITY) SUM(L.QUANTITY*L.PRICE)
--------------- -----------------------
2441786 48892118.9


Execution Plan
----------------------------------------------------------
Plan hash value: 2517766180

-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 139 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 34 | | |
|* 2 | TABLE ACCESS BY INDEX ROWID| SALES_LINES | 984 | 33456 | 139 (0)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | SALES_LINES_UK | 10000 | | 24 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("L"."PROD_ID"=5)
3 - access("L"."SALE_ID">=25000 AND "L"."SALE_ID"<=35000)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
139 consistent gets
0 physical reads
0 redo size
523 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Oracle can simply eliminate the lookup on the SALES table and do much fewer consistent gets.
The presence of the constraint ensures that every SALE_ID in the (child) SALES_LINES table *does* have corresponding SALE_ID in the (parent) SALES table. Since I am not fetching any columns from the SALES table, the join is now unnecessary and the optimizer (smartly) eliminates the join.

.

UPDATE : See the Optimizer team's blog post on Table (Join) Elimination.

.

.
.

08 October, 2010

Featured in Oracle Magazine

The November-December 2010 issue of Oracle Magazine includes a short interview in the Peer-to-Peer column.

.
.
.

02 October, 2010

Data Skew and Cardinality Changing --- 2

In my earlier post, I demonstrated how Data Skew can change over time, and the difference between Oracle's Cardinality estimates and the real row count can increase. That was with a Partitioned Table (partitioned by Time).

I now demonstrate the behaviour with a non-Partitioned Table, using the same data. The SALES_NP table captures Sales orders for various types of products. PROD_ID=31 represents '3.5" diskettes". (More details are available in the previous post).

I first create SALES_NP with data upto the Year 2000 :

SQL> drop table sales_np purge;

Table dropped.

SQL> create table sales_np as select * from sales_partitioned where 1=2;

Table created.

SQL> create index sales_np_prod_ndx on sales_np(prod_id);

Index created.

SQL> create index sales_np_time_ndx on sales_np(time_id);

Index created.

SQL>
SQL> -- Populate the table with data upto Year 2000
SQL> insert into sales_np select * from sales_partitioned where time_id < to_date('01-JAN-2001','DD-MON-YYYY');

659425 rows created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','SALES_NP',estimate_percent=>100,granularity=>'ALL',degree=>4,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

PL/SQL procedure successfully completed.

SQL>


I then verify the row counts :

SQL> -- What are the total number of Sales records by year ?
SQL> select /*+ FULL(s) PARALLEL (s 4) */ to_char(time_id,'YYYY'), count(*)
2 from sales_np s
3 group by to_char(time_id,'YYYY')
4 order by 1
5 /

TO_C COUNT(*)
---- ----------
1998 178834
1999 247945
2000 232646

SQL>
SQL> -- How many sales exist for the range 01-Jan-1998 to 31-Dec-2001 ?
SQL> select count(*)
2 from sales_np
3 where time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
4 /

COUNT(*)
----------
659425

SQL>
SQL> -- How many sales of 3.5" diskettes were made in the range 01-Jan-1998 to 31-Dec-2001 ?
SQL> select prod_id, to_char(time_id,'YYYY'), count(*)
2 from sales_np
3 where
4 prod_id=31
5 and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
6 group by prod_id,to_char(time_id,'YYYY')
7 order by 1,2
8 /

PROD_ID TO_C COUNT(*)
---------- ---- ----------
31 1998 6602
31 1999 6586
31 2000 7568

SQL>
SQL> -- How many sales 3.5" diskettes were made in 2000 ?
SQL> select prod_id, to_char(time_id,'YYYY'), count(*)
2 from sales_np
3 where
4 prod_id=31
5 and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
6 group by prod_id,to_char(time_id,'YYYY')
7 order by 1,2
8 /

PROD_ID TO_C COUNT(*)
---------- ---- ----------
31 2000 7568

SQL>

So, the Year 2000 saw 7,568 sales orders for 3.5" diskettes.

Let's see Oracle's cardinality estimates for each of the years :

SQL> REM REM ###################################################### #################
SQL> -- Verify the Cardinality Estimates for selected years
SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 945744872

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7829 | 93948 | 347 (1)| 00:00:05 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 7829 | 93948 | 347 (1)| 00:00:05 |
|* 2 | INDEX RANGE SCAN | SALES_NP_PROD_NDX | 20756 | | 67 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("TIME_ID" "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("PROD_ID"=31)

16 rows selected.

SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 945744872

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7339 | 88068 | 347 (1)| 00:00:05 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 7339 | 88068 | 347 (1)| 00:00:05 |
|* 2 | INDEX RANGE SCAN | SALES_NP_PROD_NDX | 20756 | | 67 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TIME_ID" 2 - access("PROD_ID"=31)

16 rows selected.

SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3261648362

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 228 | 40 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 19 | 228 | 40 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SALES_NP_TIME_NDX | 602 | | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("PROD_ID"=31)
2 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TIME_ID"
16 rows selected.

SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3261648362

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 228 | 28 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 19 | 228 | 28 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SALES_NP_TIME_NDX | 401 | | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("PROD_ID"=31)
2 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TIME_ID"
16 rows selected.

SQL>

So we know the estimates for the years 1999 and 2000 (at 7829 and 7339 rows) aren't too far off the real row counts (6586 and 7568).

I now insert the data for the year 2001 :

SQL> -- let's insert 2001 data
SQL>
SQL> insert /*+ APPEND */ into sales_np
2 select *
3 from sales_partitioned
4 where 1=1
5 and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
6 and decode(prod_id,31,mod(cust_id,3),0)=0
7 /

257877 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> -- the number of total sales orders in 2001
SQL> select count(*)
2 from sales_np
3 where time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
4 /

COUNT(*)
----------
257877

SQL>
SQL> -- the number of sales orders for 3.5" diskettes in 2001
SQL> select count(*)
2 from sales_np
3 where time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
4 and prod_id=31
5 /

COUNT(*)
----------
811

SQL>
SQL> -- Gather Statistics
SQL> exec dbms_stats.gather_table_stats('','SALES_NP',estimate_percent=>100,granularity=>'ALL',degree=>4,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

PL/SQL procedure successfully completed.

SQL>

So I have inserted 811 sales orders for 3.5" diskettes in the year 2001. (In the previous example, with a Partitioned Table, it was 2,352 orders in the year 2001 and 811 in the year 2002).

Let's re-check the Cardinality estimates. Remember : I did NOT change the count of rows in the years 1998 to 2000 ! They are still the same. I have only added new rows in 2001 such that we have much fewer orders for 3.5" diskettes, although the total number of rows for 2001 actually is higher than that for 2000. I have increased the skew for 3.5" diskettes -- representing significantly depressed sales for this product alone.


SQL> -- ReVerify the Cardinality Estimates by year
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 945744872

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5852 | 70224 | 364 (1)| 00:00:05 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 5852 | 70224 | 364 (1)| 00:00:05 |
|* 2 | INDEX RANGE SCAN | SALES_NP_PROD_NDX | 21567 | | 72 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("TIME_ID" "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("PROD_ID"=31)

16 rows selected.

SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 945744872

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5477 | 65724 | 364 (1)| 00:00:05 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 5477 | 65724 | 364 (1)| 00:00:05 |
|* 2 | INDEX RANGE SCAN | SALES_NP_PROD_NDX | 21567 | | 72 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TIME_ID" 2 - access("PROD_ID"=31)

16 rows selected.

SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 945744872

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6076 | 72912 | 364 (1)| 00:00:05 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 6076 | 72912 | 364 (1)| 00:00:05 |
|* 2 | INDEX RANGE SCAN | SALES_NP_PROD_NDX | 21567 | | 72 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TIME_ID" 2 - access("PROD_ID"=31)

16 rows selected.

SQL>
SQL> explain plan for
2 select prod_id
3 from sales_np
4 where prod_id=31
5 and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
6 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3261648362

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 180 | 46 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES_NP | 15 | 180 | 46 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SALES_NP_TIME_NDX | 628 | | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("PROD_ID"=31)
2 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"TIME_ID"
16 rows selected.

SQL>


Notice how the discrepancy between Actual Row Counts and Estimated Row Counts has significantly increased -- even for earlier years were there was absolutely no change in the data. Before inserting year 2001 data, the discrepancy between Actual and Estimated for the year 2000 was only 229 (7568-7339). With 2001 data, the discrepancy increased to 2191 (7568-5477). The discrepancy for the year 2001 is now very significant --- against an actual count of 811 rows, the estimated count is 6076 rows !


Actual versus Estimated Cardinality for PROD_ID=31


Year Actual Estimated Estimated
before inserting after inserting
Year 2001 rows Year 2001 rows
1999 6586 7829 5852
2000 7568 7339 5477
2001 0 19 na
2001 inserted 811 na 6076
2002 0 19 15



Why do we see such discrepancies ?
A. Estimated Row Counts (Cardinality Estimates) are based on Column Statistics.
B. Column Statistics are at the Table level (across all the years, not for specific years).
C. As data skew changes, the estimated counts get "distributed" across the years because the estimates are averaged out across the whole table.

.
.
.

01 October, 2010

Most Popular Posts

Blogger's "Stats" feature shows that the 3 most popular posts in the past 30 days have been :
3. NLS_DATE_FORMAT : 144 pageviews

.
.
.

26 September, 2010

Data Skew changing over time --- and the Cardinality Estimate as well !

What happens if the nature of "data skew" changes over time ? For example, in the first few years of operations, you manufacture and sell 4 types of widgets ("A", "B", "C" and "D") where the volume of sales is approximately equally distributed amongst the 4 (Sales may be increasing over the years, but the distribution -- i.e. proportions -- remains the same). After say, 5 years, you add new widgets ("X", "Y" and "Z") to your product line. You notice that sales for "A" and "B" are now actually declining while sales for "Z" over the next 3 years are growing much faster than "C", "D", "X" and "Y".
What you have is changing data skew.
How does the optimizer estimate cardinality ?

Column Statistics used for a Histogram are at the *Table* level. A Histogram of the frequency of occurrences of these widgets in your sales would be computed merely as the number of occurrences across the table -- without accounting for dates. At the end of year 4, Oracle may say that "A" accounted for 25% of sales. At the end of year 8, "A" may now account for only 5% of total sales across all 8 years. And that is why the histogram would say. However, the histogram would not now (at the end of 8 years) say that "A" still accounts for 25% of sales in the first 4 years ! The information content is now "normalised" -- the metadata loses some vital information. (Of course, actually querying the table still shows you that "A" accounts for 25% of sales in the first 4 years, but the Optimizer's statistics can no longer reflect this knowledge). The Optimizer's cardinality estimates after 8 years are actually "weaker" because it has "normalised" (or "averaged out") the skew.


I've takan the "SH" schema's SALES table (this is available as one of the EXAMPLE schemas) from an Oracle 10.2 installation to demonstrate this. I take the example of 3.5inch diskettes whose sales decline. To simplify the case, I count the number of Sales Orders, not the quantities across all orders (for example, we could assume that each Sales Order is for exactly 1000 diskettes).

First, I start with SALES information for the years 1998 to 2002 :
SQL> exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,granularity=>'ALL',degree=>4,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

PL/SQL procedure successfully completed.

SQL>
SQL> -- What is PROD_ID 31 ?
SQL> select prod_id, prod_name
  2  from products
  3  where prod_id=31
  4  /

   PROD_ID PROD_NAME
---------- --------------------------------------------------
        31 1.44MB External 3.5" Diskette

SQL>
SQL> -- What are the range of dates in the Sales (history) table ?
SQL> select min(time_id), max(time_id)
  2  from times
  3  /

MIN(TIME_ MAX(TIME_
--------- ---------
01-JAN-98 31-DEC-02

SQL>
SQL> -- What are the total number of Sales records by year ?
SQL> select /*+ FULL(s) PARALLEL (s 4) */ to_char(time_id,'YYYY'), count(*)
  2  from sales s
  3  group by to_char(time_id,'YYYY')
  4  order by 1
  5  /

TO_C   COUNT(*)
---- ----------
1998     178834
1999     247945
2000     232646
2001     259418

SQL>
SQL> -- How many sales exist for the range 01-Jan-1998 to 31-Dec-2001 ?
SQL> select count(*)
  2  from sales
  3  where time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  4  /

  COUNT(*)
----------
    918843

SQL>
SQL> -- How many sales of 3.5" diskettes were made in the range 01-Jan-1998 to 31-Dec-2001 ?
SQL> select prod_id, to_char(time_id,'YYYY'), count(*)
  2  from sales
  3  where
  4  prod_id=31
  5  and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  group by prod_id,to_char(time_id,'YYYY')
  7  order by 1,2
  8  /

   PROD_ID TO_C   COUNT(*)
---------- ---- ----------
        31 1998       6602
        31 1999       6586
        31 2000       7568
        31 2001       2352

SQL>
SQL> -- How many sales 3.5" diskettes were made in  2002 ?
SQL> select prod_id, to_char(time_id,'YYYY'), count(*)
  2  from sales
  3  where
  4  prod_id=31
  5  and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  6  group by prod_id,to_char(time_id,'YYYY')
  7  order by 1,2
  8  /

no rows selected

SQL> 

We know that gross Sales (for all products) have (generally) been increasing over the years -- from 179thousand orders in 1998 to 259thousand in 2001 (with a slight dip in the year 2000).
However, sales of 3.5" diskettes declined in 2001.
Sales for 2002 are not recorded.

We look at the Optimizer's cardinality estimates based on available statistics :
SQL> -- Verify the Cardinality Estimates by year
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-1999','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  4478 | 53736 |    22  (10)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  4478 | 53736 |    22  (10)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  4478 | 53736 |     6   (0)| 00:00:01 |     4 |     8 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  4478 | 53736 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     4 |     8 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  4478 | 53736 |    15   (7)| 00:00:01 |     4 |     8 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  4478 | 53736 |    15   (7)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     4 |     8 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              1999-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  6275 | 75300 |    27   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  6275 | 75300 |    27   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  6275 | 75300 |     6   (0)| 00:00:01 |     8 |    12 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  6275 | 75300 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     8 |    12 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  6275 | 75300 |    20   (5)| 00:00:01 |     8 |    12 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  6275 | 75300 |    20   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     8 |    12 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5859 | 70308 |    25   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5859 | 70308 |    25   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5859 | 70308 |     6   (0)| 00:00:01 |    12 |    16 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5859 | 70308 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    12 |    16 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5859 | 70308 |    18   (6)| 00:00:01 |    12 |    16 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5859 | 70308 |    18   (6)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    12 |    16 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  6528 | 78336 |    27   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  6528 | 78336 |    27   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  6528 | 78336 |     6   (0)| 00:00:01 |    16 |    20 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  6528 | 78336 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    16 |    20 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  6528 | 78336 |    20   (5)| 00:00:01 |    16 |    20 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  6528 | 78336 |    20   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    16 |    20 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1295169200

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |    16 |   192 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR    |                |    16 |   192 |     4   (0)| 00:00:01 |    20 |    24 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                |    16 |   192 |     4   (0)| 00:00:01 |       |       |
|   3 |    BITMAP AND                |                |       |       |            |          |       |       |
|   4 |     BITMAP MERGE             |                |       |       |            |          |       |       |
|*  5 |      BITMAP INDEX RANGE SCAN | SALES_TIME_BIX |       |       |            |          |    20 |    24 |
|*  6 |     BITMAP INDEX SINGLE VALUE| SALES_PROD_BIX |       |       |            |          |    20 |    24 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   6 - access("PROD_ID"=31)

20 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2002','DD-MON-YYYY') and time_id < to_date('01-JAN-2004','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2294783259

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                |    16 |   192 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR          |                |    16 |   192 |     2   (0)| 00:00:01 |    24 |    28 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |    16 |   192 |     2   (0)| 00:00:01 |    24 |    28 |
|   3 |    BITMAP CONVERSION TO ROWIDS     |                |       |       |            |          |       |       |
|*  4 |     BITMAP INDEX RANGE SCAN        | SALES_TIME_BIX |       |       |            |          |    24 |    28 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PROD_ID"=31)
   4 - access("TIME_ID">TO_DATE(' 2002-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

18 rows selected.
SQL>
SQL> 


We can see that the estimates for PROD_ID=31 do not match reality. Particularly for 2001.

Let's now create 2002 data :
SQL> -- let's create  2002 sales data
SQL> alter index sales_time_bix modify partition sales_q1_2002 unusable;

Index altered.

SQL> alter index sales_time_bix modify partition sales_q2_2002 unusable;

Index altered.

SQL> alter index sales_time_bix modify partition sales_q3_2002 unusable;

Index altered.

SQL> alter index sales_time_bix modify partition sales_q4_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q1_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q2_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q3_2002 unusable;

Index altered.

SQL> alter index sales_prod_bix modify partition sales_q4_2002 unusable;

Index altered.

SQL>
SQL>
SQL> insert /*+ APPEND */ into sales
  2  select prod_id, cust_id, time_id+365, channel_id, promo_id, quantity_sold, amount_sold
  3  from sales
  4  where 1=1
  5  and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  and decode(prod_id,31,mod(cust_id,3),0)=0
  7  /

257877 rows created.

SQL> alter index sales_time_bix rebuild partition sales_q1_2002 ;

Index altered.

SQL> alter index sales_time_bix rebuild partition sales_q2_2002 ;

Index altered.

SQL> alter index sales_time_bix rebuild partition sales_q3_2002 ;

Index altered.

SQL> alter index sales_time_bix rebuild partition sales_q4_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q1_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q2_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q3_2002 ;

Index altered.

SQL> alter index sales_prod_bix rebuild partition sales_q4_2002 ;

Index altered.

SQL>
SQL>
SQL> -- the number of total sales orders in 2002
SQL> select count(*)
  2  from sales
  3  where time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  4  /

  COUNT(*)
----------
    257877

SQL>
SQL> -- the number of sales orders for 3.5" diskettes in 2002
SQL> select count(*)
  2  from sales
  3  where time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  4  and prod_id=31
  5  /

  COUNT(*)
----------
       811

SQL> -- so we had only 811 sales orders for 3.5" diskettes
SQL>
SQL> -- Gather Statistics
SQL> exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,granularity=>'ALL',degree=>4,-
> method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

PL/SQL procedure successfully completed.

SQL> 


Thus, although gross sales for all products did not decline significantly in 2002, sales for 3.5" diskettes, at 811 orders, were only 1/3rd the number in 2001.

Let's revisit the Optimizer's cardinality estimates :
SQL> -- ReVerify the Cardinality Estimates by year
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-1999','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  3636 | 43632 |    22  (10)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  3636 | 43632 |    22  (10)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  3636 | 43632 |     6   (0)| 00:00:01 |     4 |     8 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  3636 | 43632 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     4 |     8 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  3636 | 43632 |    15   (7)| 00:00:01 |     4 |     8 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  3636 | 43632 |    15   (7)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     4 |     8 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              1999-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5047 | 60564 |    27   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5047 | 60564 |    27   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5047 | 60564 |     6   (0)| 00:00:01 |     8 |    12 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5047 | 60564 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |     8 |    12 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5047 | 60564 |    20   (5)| 00:00:01 |     8 |    12 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5047 | 60564 |    20   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |     8 |    12 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  4755 | 57060 |    26   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  4755 | 57060 |    26   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  4755 | 57060 |     6   (0)| 00:00:01 |    12 |    16 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  4755 | 57060 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    12 |    16 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  4755 | 57060 |    19   (6)| 00:00:01 |    12 |    16 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  4755 | 57060 |    19   (6)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    12 |    16 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID"              "TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5277 | 63324 |    28   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5277 | 63324 |    28   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5277 | 63324 |     6   (0)| 00:00:01 |    16 |    20 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5277 | 63324 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    16 |    20 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5277 | 63324 |    21   (5)| 00:00:01 |    16 |    20 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5277 | 63324 |    21   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    16 |    20 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2009851720

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |  5250 | 63000 |    28   (8)| 00:00:01 |       |       |
|*  1 |  VIEW                          | index$_join$_001 |  5250 | 63000 |    28   (8)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR    |                  |  5250 | 63000 |     6   (0)| 00:00:01 |    20 |    24 |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |  5250 | 63000 |     6   (0)| 00:00:01 |       |       |
|*  5 |      BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX   |       |       |            |          |    20 |    24 |
|   6 |    PARTITION RANGE ITERATOR    |                  |  5250 | 63000 |    21   (5)| 00:00:01 |    20 |    24 |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |  5250 | 63000 |    21   (5)| 00:00:01 |       |       |
|*  8 |      BITMAP INDEX RANGE SCAN   | SALES_TIME_BIX   |       |       |            |          |    20 |    24 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   2 - access(ROWID=ROWID)
   5 - access("PROD_ID"=31)
   8 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"              2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

25 rows selected.

SQL>
SQL>
SQL> explain plan for
  2  select prod_id
  3  from sales
  4  where prod_id=31
  5  and time_id > to_date('31-DEC-2002','DD-MON-YYYY') and time_id < to_date('01-JAN-2004','DD-MON-YYYY')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1295169200

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |    13 |   156 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR    |                |    13 |   156 |     4   (0)| 00:00:01 |    24 |    28 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                |    13 |   156 |     4   (0)| 00:00:01 |       |       |
|   3 |    BITMAP AND                |                |       |       |            |          |       |       |
|   4 |     BITMAP MERGE             |                |       |       |            |          |       |       |
|*  5 |      BITMAP INDEX RANGE SCAN | SALES_TIME_BIX |       |       |            |          |    24 |    28 |
|*  6 |     BITMAP INDEX SINGLE VALUE| SALES_PROD_BIX |       |       |            |          |    24 |    28 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("TIME_ID">TO_DATE(' 2002-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TIME_ID"   6 - access("PROD_ID"=31)

20 rows selected.

SQL> 


Oracle now seems to have continued "normalising" (or "averaging out") the distribution across all the years. The discrepancy between the actual count of rows and the estimated count of rows has increased significantly.
Actual versus Estimated Cardinality for PROD_ID=31


Year            Actual  Estimated  Estimated
                before inserting   after inserting
                Year 2002 rows     Year 2002 rows

1998              6602       4478       3636   
1999              6586       6275       5047
2000              7568       5859       4755
2001              2352       6528       5277
2002                 0         16         na
2002 inserted      811         na       5250
2003                 0         16         13
                                                               


Although the number of occurrences of Sales Orders for PROD_ID=31 has declined considerably in 2001 and 2002, the Optimizer cannot adjust for the decline. The discrepancy between Actual and Estimated row counts increases over time as the skew changes.
The Cardinality estimate is computed from Column statistics (number of distinct values, selectivity, histogram) at the Table level but combined with Partition row counts without any knowledge of the occurrences of PRODUCT_ID=31 in each of the target Partitions for the year 2002.


SUGGESTED TESTS For Readers :
1. What if we had a BTree Index ?
2. What if the SALES table wasn't Partitioned ?


UPDATE : See the next post, with a Non-Partitioned Table
.
.
.