27 April, 2008

Row Sizes and Sort Operations

The Row Size can have a signicant impact on a sort operation. A SORT operation would be required not only when an explicit SORT BY is specified but also for operations like GROUP BY (10g has introduced "Group By Hash" operations to speed up Grouping without doing explicit sorts, but there are some bugs with this feature).

Here is a test case where I create a "TEST_SORT_TABLE" and index it on "ID_COLUMN" and "ITEM_TYPE".




-- create the source table with slightly large row lengths
drop table test_sort_table;
create table test_sort_table (id_column number not null, item_name varchar2(36) not null, item_type varchar2(19) not null, owner_name varchar2(30) not null, other_cols varchar2(128)) ;
alter table test_sort_table nologging;
insert /*+ APPEND */ into test_sort_table
2 select object_id, substr(object_name,1,32), object_type, owner, rpad('a',124,'f')
3 from dba_objects
4 where object_id is not null
5 order by substr(object_type,1,6)substr(object_name,4,9) ;
commit;
insert /*+ APPEND */ into test_sort_table
2 select * from test_sort_table
3 union all select * from test_sort_table
4 order by substr(item_name,1,8)substr(owner_name,2,4);
commit;
create index test_sort_table_ndx_1 on test_sort_table (id_column , item_name);
exec dbms_stats.gather_table_Stats('','TEST_SORT_TABLE',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100,cascade=>TRUE);

-- get a "feel" for the size of the table (row count, row length, block count)
select count(*) from test_sort_table;

COUNT(*)
----------
155025
select num_rows, avg_row_len, blocks from user_tables where table_name = 'TEST_SORT_TABLE';

NUM_ROWS AVG_ROW_LEN BLOCKS
---------- ----------- ----------
155025 168 3835
select num_rows, blevel, leaf_blocks from user_indexes where index_name = 'TEST_SORT_TABLE_NDX_1';

NUM_ROWS BLEVEL LEAF_BLOCKS
---------- ---------- -----------
155025 2 865

explain plan for
2 select /* only the index columns */ id_column, item_name from test_sort_table order by id_column , item_name ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2301815179

------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 4541K 873 (1) 00:00:11
1 INDEX FULL SCAN TEST_SORT_TABLE_NDX_1 155K 4541K 873 (1) 00:00:11
------------------------------------------------------------------------------------------

explain plan for
2 select /* one additional column */ id_column, item_name, item_type from test_sort_table order by id_column , item_name ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1390622354

----------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
----------------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 5904K 2429 (2) 00:00:30
1 SORT ORDER BY 155K 5904K 15M 2429 (2) 00:00:30
2 TABLE ACCESS FULL TEST_SORT_TABLE 155K 5904K 851 (2) 00:00:11
----------------------------------------------------------------------------------------------

explain plan for
2 select /* all columns, not ordered */ * from test_sort_table ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2991649059

-------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 24M 852 (2) 00:00:11
1 TABLE ACCESS FULL TEST_SORT_TABLE 155K 24M 852 (2) 00:00:11
-------------------------------------------------------------------------------------

explain plan for
2 select /* all columns, ordered */ * from test_sort_table order by id_column , item_name ;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1390622354

----------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
----------------------------------------------------------------------------------------------
0 SELECT STATEMENT 155K 24M 6597 (1) 00:01:20
1 SORT ORDER BY 155K 24M 55M 6597 (1) 00:01:20
2 TABLE ACCESS FULL TEST_SORT_TABLE 155K 24M 852 (2) 00:00:11
----------------------------------------------------------------------------------------------





In the first query, I query for only the indexed columns and request an ORDER BY on
these columns. Oracle is able to do an "Index Full Scan" (which retrieves the values *in order*) for all 155thousand rows at a "COST" of 873.

In the second query, I request 3 columns (ie, including one additional column not in the index). Since Oracle has to go to the Table to fetch the third column, it decides to do a Full Table Scan as it has to read ALL the rows from the table. It avoids using the Index. That is sensible. However, note that the total "Bytes" fetched has gone up from 4541K to 5904K because of the additional column. What is worse is that the SORT operation requires 15MB of TempSpace and increases the total "COST". Apparently, the "COST" of the SORT operation is 1,578 (2429-851).
One additional column ("ITEM_TYPE" varchar2(19) -- doesn't seem very large ?) has increased the "COST" of my query from 873 to 2,429

If we compare the third and fourth queries the TempSpace for the SORT operation is now 55MB and the "COST" goes up from 852 to 6,597. Why such a big difference ? Although my Sort is supposed to be on only 2 columns, since Oracle fetches the full row, it has to effectively "sort" all the columns. It might have been better if we could pre-sort from the Index and, then, go to the Table, to fetch the rows in the sorted order. Would it ? Actually , no. Even if we use the Index to pre-sort and get ROWIDs in sorted order, we'd be introducing multiple read calls to then fetch the rows from the table -- a single block read call for each row. Clustering of the table might help reducing physical reads but not 'consistent gets'. I leave that experiement for you to try.

So what we get is :


1. Fetch Only 2 Indexed Columns in Sorted Order : COST 873, TempSpace NIL
3. Fetch 2+1 Columns in Sorted Order : COST 2,429, TempSpace 15M
4. Fetch ALL Columns without a Sort : COST 852, TempSpace NIL
5. Fetch ALL columns and Sort on 2 columns : COST 6,597, TempSpace 55M




The next time you write a query that does a "SELECT *" stop and think about how the
Row Size affects the "COST". If you add an "ORDER BY" stop again and think twice
as hard.

No comments: