27 March, 2014

Storing Trailing NULLs in a table

Oracle has an optimization in that if, for a row, all trailing columns are NULL, it does not have to store the NULL in each column, but saves space by reducing the physical size of the row in the block (in effect "compressing the NULLs").  If, however, the NULLs appear in intermediate columns between the first column and the last column (with one or more non-NULL values intervening or at the end), it has to write one byte for each NULL, without any "compression".

Here is a simple demo with two tables holding 9 data columns each.  In 90% of the rows, 6 columns have NULL values.

In the first table, an intermediate column and the last column have non-NULL value, thus resulting in the columns having NULLs  interspersed with non-NULLs.

SQL> drop table intermediate_nulls;

Table dropped.

SQL> 
SQL> create table intermediate_nulls
  2  (
  3  id_column number,
  4  data_col_1 varchar2(25),
  5  data_col_2 varchar2(25),
  6  data_col_3 varchar2(25),
  7  data_col_4 varchar2(25),
  8  data_col_5 varchar2(25),
  9  data_col_6 varchar2(25),
 10  data_col_7 varchar2(25),
 11  data_col_8 varchar2(25),
 12  data_col_9 varchar2(25)
 13  )
 14  /

Table created.

SQL> 
SQL> REM  Insert 1 million rows
SQL> REM  9 in 10 rows have NULLs for cols 2 to 5, 7 to 8  but all have values in cols 1,6,9
SQL> insert into intermediate_nulls
  2  select rownum,
  3  'Col 1 Data',
  4  decode(mod(rownum,10),0,'Col 2 Data', null),
  5  decode(mod(rownum,10),0,'Col 3 Data', null),
  6  decode(mod(rownum,10),0,'Col 4 Data', null),
  7  decode(mod(rownum,10),0,'Col 5 Data', null),
  8  'Col 6 Data',
  9  decode(mod(rownum,10),0,'Col 7 Data', null),
 10  decode(mod(rownum,10),0,'Col 8 Data', null),
 11  'Col 9 data'
 12  from dual
 13  connect by level < 1000001
 14  /

1000000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('','INTERMEDIATE_NULLS');

PL/SQL procedure successfully completed.

SQL> 
SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'INTERMEDIATE_NULLS'
  4  /

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
   1000000          50       7678

SQL> 

In the second table, all 6 trailing columns have NULLs.

SQL> drop table trailing_nulls;

Table dropped.

SQL> 
SQL> create table trailing_nulls
  2  (
  3  id_column number,
  4  data_col_1 varchar2(25),
  5  data_col_2 varchar2(25),
  6  data_col_3 varchar2(25),
  7  data_col_4 varchar2(25),
  8  data_col_5 varchar2(25),
  9  data_col_6 varchar2(25),
 10  data_col_7 varchar2(25),
 11  data_col_8 varchar2(25),
 12  data_col_9 varchar2(25)
 13  )
 14  /

Table created.

SQL> 
SQL> 
SQL> REM  Insert 1 million rows
SQL> REM  9 in 10 rows have NULLs for cols 4 to 9 -- i.e. all trailing cols
SQL> insert into trailing_nulls
  2  select rownum,
  3  'Col 1 Data',
  4  'Col 2 Data',
  5  'Col 3 Data',
  6  decode(mod(rownum,10),0,'Col 4 Data', null),
  7  decode(mod(rownum,10),0,'Col 5 Data', null),
  8  decode(mod(rownum,10),0,'Col 6 Data', null),
  9  decode(mod(rownum,10),0,'Col 7 Data', null),
 10  decode(mod(rownum,10),0,'Col 8 Data', null),
 11  decode(mod(rownum,10),0,'Col 9 Data', null)
 12  from dual
 13  connect by level < 1000001
 14  /

1000000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('','TRAILING_NULLS');

PL/SQL procedure successfully completed.

SQL> 
SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'TRAILING_NULLS'
  4  /

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
   1000000          50       6922

SQL> 

We can see that the TRAILING_NULLS table consumes approximately 10% less disk space.
Note : The AVG_ROW_LEN is the small.  When the data is queried, NULLs (one byte each) are returned to the client.  It is in the data block where the "compression" is done.
.
.
.

23 March, 2014

Plan HASH_VALUE remains the same for the same Execution Plan, even if ROWS and COST change

Here is a simple demo that shows that the Plan Hash_Value does not consider the ROWS and COST but only the Execution Plan.  Thus, even with more rows added into a table, if the Execution Plan remains the same for a query, it is independent of the number of ROWS and the COST changing.

SQL> -- create the table
SQL> create table branch_list
  2  (country_code varchar2(3), branch_code number, branch_city varchar2(50));

Table created.

SQL> 
SQL> -- create an index
SQL> create index branch_list_cntry_ndx
  2  on branch_list(country_code);

Index created.

SQL> 
SQL> 
SQL> 
SQL> -- populate it with 100 rows, one third being 'IN'
SQL> insert into branch_list
  2  select decode(mod(rownum,3),0,'IN',1,'US',2,'US'), rownum, dbms_random.string('X',32)
  3  from dual
  4  connect by level < 101
  5  /

100 rows created.

SQL> 
SQL> -- gather statistics
SQL> exec dbms_stats.gather_table_stats('','BRANCH_LIST');

PL/SQL procedure successfully completed.

SQL> 
SQL> -- get an execution plan
SQL> explain plan for
  2  select branch_code, branch_city
  3  from branch_list
  4  where country_code = 'IN'
  5  /

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 513528032

-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |    50 |  1950 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BRANCH_LIST           |    50 |  1950 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BRANCH_LIST_CNTRY_NDX |    50 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COUNTRY_CODE"='IN')

14 rows selected.

SQL> 
SQL> 
SQL> -- add another 400 rows, none of them being 'IN'
SQL> insert into branch_list
  2  select decode(mod(rownum,6),0,'SG',1,'US',2,'US',3,'US',4,'AU',5,'UK'), rownum+100, dbms_random.string('X',32)
  3  from dual
  4  connect by level < 401
  5  /

400 rows created.

SQL> 
SQL> -- update statistics
SQL> exec dbms_stats.gather_table_stats('','BRANCH_LIST');

PL/SQL procedure successfully completed.

SQL> 
SQL> -- get the execution plan again
SQL> explain plan for
  2  select branch_code, branch_city
  3  from branch_list
  4  where country_code = 'IN'
  5  /

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 513528032

-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |    33 |  1320 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BRANCH_LIST           |    33 |  1320 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BRANCH_LIST_CNTRY_NDX |    33 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COUNTRY_CODE"='IN')

14 rows selected.

SQL> 
SQL> select column_name, histogram
  2  from user_tab_columns
  3  where table_name = 'BRANCH_LIST';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
COUNTRY_CODE                   FREQUENCY
BRANCH_CODE                    NONE
BRANCH_CITY                    NONE

SQL> select count(*) from user_tab_histograms
  2  where table_name = 'BRANCH_LIST'
  3  and column_name = 'COUNTRY_CODE'
  4  /

  COUNT(*)
----------
         5

SQL> 


After the addition of 400 rows to a 100 row table, the distribution of rows has changed. At the second Gather_Table_Stats call, Oracle has properly omputed a Frequency Histogram on the COUNTRY_CODE column for the 5 countries ('IN','US','SG','AU','UK').  The estimate for the COUNTRY_CODE='IN' is now more accurate.

However, what I want to demonstrate here is that although "ROWS" (and "BYTES for that many ROWS) and "COST" have changed in the new Execution Plan, the PLAN HASH VALUE ("513528032") remains the same.  Thus, the PLAN HASH VALUE is independent of changes to the ROWS/BYTES and COST.  The Execution Plan, per se, hasn't changed.
.
.
.



17 March, 2014

My slideshare site has had 1000 views

I have put up some presentations and articles on slideshare.  My site there has had 1,000 views.
.
.
.

01 March, 2014

"Dropping" an Index Partition

Here is a simple way to "drop" an Index Partition  using the 11.2 behaviour of DEFERRED_SEGMENT_CREATION  by dropping the segment for the Index Partition.

CORRECTION : Although I posted this as DEFERRED_SEGMENT_CREATION=TRUE behaviour, Marko has pointed out that it applies even when D_S_C is set to FALSE.  So I am posting two test cases, one with D_S_C set to TRUE, and one with it set to FALSE.

This is the first case with DEFERRED_SEGMENT_CREATION=TRUE

HEMANT>show parameter deferre

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            boolean     TRUE
HEMANT>
HHEMANT>l
  1  create table sales_history
  2      (sale_date date, product_id number, customer_id number, quantity number, price number, remarks varchar2(125))
  3      partition by range (sale_date)
  4      (partition P_2010 values less than (to_date('01-JAN-2011','DD-MON-YYYY')),
  5       partition P_2011 values less than (to_date('01-JAN-2012','DD-MON-YYYY')),
  6       partition P_2012 values less than (to_date('01-JAN-2013','DD-MON-YYYY')),
  7       partition P_2013 values less than (to_date('01-JAN-2014','DD-MON-YYYY')),
  8       partition P_2014 values less than (to_date('01-JAN-2015','DD-MON-YYYY')),
  9       partition P_2015 values less than (to_date('01-JAN-2016','DD-MON-YYYY')),
 10       partition P_MAX values less than (MAXVALUE)
 11* )
HEMANT>/

Table created.

HEMANT>
HEMANT>l
  1  create bitmap index sales_history_prdct_ndx
  2  on sales_history(product_id)
  3* local
HEMANT>/

Index created.

HEMANT>
HEMANT>select segment_name, partition_name    
  2  from user_segments
  3  where segment_type = 'INDEX PARTITION'
  4  and segment_name = 'SALES_HISTORY_PRDCT_NDX'
  5  /

no rows selected

HEMANT>

Since deferred_segment_creation is set to TRUE, no index partition segments exist till data is loaded.

HEMANT>l
  1  insert into sales_history
  2       select to_date('01-JUL-2010','DD-MON-RR')+ (rownum*365), mod(rownum,5)+1,mod(rownum,1000)+1, 100, 12,'Sale Done'
  3       from dual
  4*      connect by level < 5
HEMANT>/

4 rows created.

HEMANT>
HEMANT>l
  1  select segment_name, partition_name
  2  from user_segments
  3  where segment_type = 'INDEX PARTITION'
  4* and segment_name = 'SALES_HISTORY_PRDCT_NDX'
HEMANT>/

SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
SALES_HISTORY_PRDCT_NDX        P_2011
SALES_HISTORY_PRDCT_NDX        P_2012
SALES_HISTORY_PRDCT_NDX        P_2013
SALES_HISTORY_PRDCT_NDX        P_2014

HEMANT>

So, now 4 index partitions are populated for the 4 rows in the 4 table partitions.

What happens when I make an Index Partition UNUSABLE ?

HEMANT>commit;

Commit complete.

HEMANT>alter index sales_history_prdct_ndx modify partition P_2011 unusable;

Index altered.

HEMANT>
HEMANT>l
  1  select segment_name, partition_name
  2  from user_segments
  3  where segment_type = 'INDEX PARTITION'
  4* and segment_name = 'SALES_HISTORY_PRDCT_NDX'
HEMANT>/

SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
SALES_HISTORY_PRDCT_NDX        P_2012
SALES_HISTORY_PRDCT_NDX        P_2013
SALES_HISTORY_PRDCT_NDX        P_2014

HEMANT>

The corresponding Index Partition Segment has also "disappeared".  I have released the space that was used by the Index Partition without actually deleting rows from the table.
This is possible with deferred_segment_creation set to TRUE.

Next is the test case with DEFERRED_SEGMENT_CREATION set to FALSE.
Here you will notice that Partitions with zero rows (i.e. those that are empty) still have Segments -- thus all the Partitions are created upfront even in the absence of rows in the table (and index).

HEMANT> show parameter deferred

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            boolean     FALSE
HEMANT>
HEMANT> create table transactions_history
  2  (txn_id number, txn_date date, txn_product_id number, txn_value number, remarks varchar2(50))
  3  partition by range (txn_date)
  4  (partition P_2010 values less than (to_date('01-JAN-2011','DD-MON-YYYY')),
  5           partition P_2011 values less than (to_date('01-JAN-2012','DD-MON-YYYY')),
  6           partition P_2012 values less than (to_date('01-JAN-2013','DD-MON-YYYY')),
  7           partition P_2013 values less than (to_date('01-JAN-2014','DD-MON-YYYY')),
  8           partition P_2014 values less than (to_date('01-JAN-2015','DD-MON-YYYY')),
  9           partition P_2015 values less than (to_date('01-JAN-2016','DD-MON-YYYY')),
 10          partition P_MAX values less than (MAXVALUE)
 11  )
 12  /

Table created.

HEMANT>
HEMANT> create bitmap index txn_hist_prdct_id on
  2  transactions_history(txn_product_id)
  3  local
  4  /

Index created.

HEMANT>
HEMANT> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'TXN_HIST_PRDCT_ID'
  4  and segment_type = 'INDEX PARTITION'
  5  order by 2
  6  /

SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
TXN_HIST_PRDCT_ID              P_2010
TXN_HIST_PRDCT_ID              P_2011
TXN_HIST_PRDCT_ID              P_2012
TXN_HIST_PRDCT_ID              P_2013
TXN_HIST_PRDCT_ID              P_2014
TXN_HIST_PRDCT_ID              P_2015
TXN_HIST_PRDCT_ID              P_MAX

7 rows selected.

HEMANT>
HEMANT> insert into transactions_history
  2  select rownum, to_date('01-JUL-2010','DD-MON-RR')+ (rownum*365), mod(rownum,5)+1,rownum*100,'Txn Done'
  3  from dual
  4  connect by level < 5
  5  /

4 rows created.

HEMANT>
HEMANT> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'TXN_HIST_PRDCT_ID'
  4  and segment_type = 'INDEX PARTITION'
  5  order by 2
  6  /

SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
TXN_HIST_PRDCT_ID              P_2010
TXN_HIST_PRDCT_ID              P_2011
TXN_HIST_PRDCT_ID              P_2012
TXN_HIST_PRDCT_ID              P_2013
TXN_HIST_PRDCT_ID              P_2014
TXN_HIST_PRDCT_ID              P_2015
TXN_HIST_PRDCT_ID              P_MAX

7 rows selected.

HEMANT>
HEMANT> alter index txn_hist_prdct_id modify partition P_2011 unusable;

Index altered.

HEMANT>
HEMANT> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'TXN_HIST_PRDCT_ID'
  4  and segment_type = 'INDEX PARTITION'
  5  order by 2
  6  /

SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
TXN_HIST_PRDCT_ID              P_2010
TXN_HIST_PRDCT_ID              P_2012
TXN_HIST_PRDCT_ID              P_2013
TXN_HIST_PRDCT_ID              P_2014
TXN_HIST_PRDCT_ID              P_2015
TXN_HIST_PRDCT_ID              P_MAX

6 rows selected.

HEMANT>
HEMANT> select *
  2  from transactions_history
  3  order by txn_date
  4  /

    TXN_ID TXN_DATE  TXN_PRODUCT_ID  TXN_VALUE
---------- --------- -------------- ----------
REMARKS
--------------------------------------------------
         1 01-JUL-11              2        100
Txn Done

         2 30-JUN-12              3        200
Txn Done

         3 30-JUN-13              4        300
Txn Done

         4 30-JUN-14              5        400
Txn Done


HEMANT>


Here we see that setting the P_2011 Partition UNUSABLE resulting it in being dropped. The segments for Partitions P_2010, P_2015 and P_MAX persist even though they are empty. .
.
.
.