16 January, 2008

The Impact of the Clustering Factor

Update 21-Feb-08: I had used this blog posting as a reference in my comment on Richard Foote's blog posting on Clustering Factor in concatenated index. As Tom Kyte pointed out, my posting wasn't clear enough, so I've tried to explain it again in my comments at Richard Foote's blog posting Clustering Factor in concatenated indexes :

I've pasted those comments again at the bottom of this posting.

Recently, I was given a query that was taking 40minutes to execute.This query involved a range scan on table_1 , among many other operationson other tables with joins.


The greatest portion of the effort [reading the 10046 trace file] was :
STAT #1 id=15 cnt=22 pid=5 pos=2 obj=866159 op='TABLE ACCESS BY INDEX ROWID table_1 '
STAT #1 id=16 cnt=3354870 pid=15 pos=1 obj=14815979 op='INDEX RANGE SCAN index_a '

{an additional filter was also being applied at operation 15}
The whole Fetch phase of that query :

FETCH #1:c=95480000,e=2366647483,p=691119,cr=1353212,cu=0,mis=0,r=1,dep=0,og=4

Event waited on Times Max. Wait Total Waited
----------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 0.00 0.00
db file sequential read 691119 0.87 2293.16
log file sync 1 0.01 0.01

read 1.353million blocks, requiring 691thousand physical block reads and
took 2.3billion microseconds – ie, 2,367 seconds.
The actual total elapsed time that SQLPlus reported was 40min24seconds (2,424 seconds).
The 691thousand physical reads accounted for 2,293 seconds --- 94.6% of the query’s
response time. {BTW : Just for our knowledge : What is the Hit Ratio ? 48.9%}


On rewriting the query with a NO_INDEX hint, the next execution showed :

STAT #1 id=16 cnt=16 pid=5 pos=2 obj=866159 op='TABLE ACCESS BY INDEX ROWID table_1 '
STAT #1 id=17 cnt=3314718 pid=16 pos=1 obj=14815976 op='INDEX RANGE SCAN index_b '

What I now got was :

FETCH #1:c=10320000,e=257057504,p=78328,cr=159300,cu=0,mis=0,r=1,dep=0,og=4

Event waited on Times Max. Wait Total Waited
----------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 0.00 0.00
db file sequential read 78328 0.44 218.20
buffer busy waits 9715 0.40 30.53
latch free 26 0.02 0.41

Only 159,300 Buffer Gets and 257million microseconds execution time.
There were 78,328 single block Physical Read calls, taking 218sconds only.
SQLPlus reports the total query execution time was 4min24seconds.
{Thus, although Physical Reads still account for 82.6% or 84.8% of the total response time,
the absolute elapsed time is less than 5 minutes – much better than the 40minutes}.
Reducing the number of Physical Read calls was the solution.
{BTW : Just for our knowledge : What is the Hit Ratio ? 50.8% -- not much of an improvement from 48.9% ?}

Both indexes are multi-column indexes on the same table.

So the question was :
“In the first SQL RowSourceOperations there 3.354million rowids fetched from index_a and the total count of Buffer Gets for the whole query was 1.353million.
In the second SQL’s RowSourceOperations there 3.315million rowids fetched from index_b and yet the total count of Buffer Gets for the whole query was only 159,300 !
Why the difference ?”

The difference lay in the Clustering Factor.

SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, distinct_keys
2 from dba_indexes
3 where owner = 'schema'
4 and index_name in ('index_b','index_a');
INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ----------- -----------------------
AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR DISTINCT_KEYS
----------------------- ----------------- -------------
index_b 463706 1 13 3490380 261796
index_a 456250 1 4 21444164 4845082
SQL>

Index index_b has a much lower Clustering Factor. This means that the number of Table Blocks
to be visited for the ROWIDs returned from an Index Range Scan is much lower than the count of blocks that a session using ROWIDs from index_a would have to read.
As inserts into table table_1 are generally ordered by column_x, the co-locality of lookups by column_x would be higher. That is why rowids returned by column_x from index_b would most likely find Table rows in contiguous blocks, not having to make multiple read calls on the Table.

Update 21-Feb-08, Since the above posting might not have been clear enough, this is how I tried to re-explain it at in my comments on Richard Foote's blog posting Clustering Factor in concatenated indexes :
0. Yes it is a range scan.

1. There are two indexes on the table. Both are multicolumn indexes. “index_a” is the bad index which the optimizer was choosing to use initially.

2. “index_a” resulted in this RowSourceOperation entry :
STAT #1 id=16 cnt=3354870 pid=15 pos=1 obj=14815979 op=’INDEX RANGE SCAN index_a ‘ and this for the Fetch :
FETCH #1:c=95480000,e=2366647483,p=691119,cr=1353212,cu=0,mis=0,r=1,dep=0,og=4

Thus, Oracle using this index in the Range Scan was doing 1.35million consistent gets.

3. I needed to disable usage of this index. I knew that there was a better index — index_b. I had 4 choices :
a) Drop index_a
b) Apply a function to the predicate
c) Manipulate column and index statistics
d) Put an explicit NO_HINT index
Actually, I could have tried the 5th option
e) Put an explicit HINT (table_1 index_b).
But I didn’t do that. I just wanted to see what alternate execution plan (not just index_b but the whole plan, nested loops whatever) Oracle chose withouth index_a.

4. The NO_INDEX Hint therefore, wasn’t really intended as the solution. My blog posting doesn’t assert it that way. I just said that I used a NO_INDEX Hint — the reason only was I wanted to disable usage of that index.

5. Oracle then switched to index_b. This resulted in :
STAT #1 id=17 cnt=3314718 pid=16 pos=1 obj=14815976 op=’INDEX RANGE SCAN index_b ‘
and
FETCH #1:c=10320000,e=257057504,p=78328,cr=159300,cu=0,mis=0,r=1,dep=0,og=4

6. The question, then, was :
“In the first SQL RowSourceOperations there 3.354million rowids fetched from index_a and the total count of Buffer Gets for the whole query was 1.353million.
In the second SQL’s RowSourceOperations there 3.315million rowids fetched from index_b and yet the total count of Buffer Gets for the whole query was only 159,300 !
Why the difference ?”

7. The assertion “The difference lay in the Clustering Factor” was actually an answer to that exact question. I didn’t assert outright that the changed execution plan or better performance was a result of the Clustering Factor.

8. These were the statistics :
INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
—————————— ———– ———————–
AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR DISTINCT_KEYS
———————– —————– ————-
index_b 463706 1 13 3490380 261796
index_a 456250 1 4 21444164 4845082

9. What I tried to put forward was that the better Clustering Factor with index_b was that although Oracle did read just as many rowids in the index range scan, the number of block gets for the *table* was much lower — the total consistent gets from 1.35million to only 159,300 !

10. Obviously, the number of rows is still similar. The number of *blocks* is much lower. That is my assertion about the Clustering Factor.

11. index_b’s leading column is a date column and rows are inserted into the table by date — ascending order. Much better clustering of rows into the same block and contiguous blocks when I fetch them ordered by the date.

12. So, the Clustering Factor didn’t result in inspection of fewer index entries. It did result in a significant reduction in consistent gets. Unfortunately, the row source operation statistics don’t really show how many index blocks and how many table blocks were read but I’m pretty sure that, because it was a Range Scan, the reduction was in table blocks.


No comments: