20 October, 2011

The impact of ASSM on Clustering of data -- 2

Revisiting my earlier test on the same topic, I add a change to "slow" down the operations. The fetching of sequence values with the default cache was so fast in the earlier test, that the three sessions that I manually invoked to execute concurrently did not show a real distribution of data. The first session had retrieved very many sequence values and inserted rows before the second session had even started ... and the third session was "far behind" -- this was because my cycling through the 3 terminal windows and hitting the ENTER button to start the executions of the procedures was not fast enough.

So, in this "twist" I deliberately set the sequences to NOCACHE, thus causing each INSERT to be much slower. Now, there is a much higher degree of concurrency of inserts.

Remember : This experiment was to demonstrate how ASSM "distributes" rows across different blocks. 3 different sessions concurrently inserting in MSSM will be hitting a single free list but will likely be inserting across different blocks in ASSM.

So, here is the test :



SQL> connect / as sysdba
Connected.
SQL>
SQL>
SQL> -- demo behaviour of ASSM with multiple sessions inserting concurrently
SQL> -- data gets "distributed" to reduce block-contention
SQL> -- but this affects the Clustering Factor !
SQL>
SQL>
SQL> drop tablespace ASSM_TBS including contents and datafiles;

Tablespace dropped.

SQL> create tablespace ASSM_TBS extent management local autoallocate segment space management auto;

Tablespace created.

SQL> drop tablespace MSSM_TBS including contents and datafiles;

Tablespace dropped.

SQL> create tablespace MSSM_TBS extent management local autoallocate segment space management manual;

Tablespace created.

SQL>
SQL>
SQL> -- grants required for stored procedures to access v$process, v$session, v$mystat
SQL> grant select on v_$process to hemant;

Grant succeeded.

SQL> grant select on v_$session to hemant;

Grant succeeded.

SQL> grant select on v_$mystat to hemant;

Grant succeeded.

SQL>
SQL>
SQL> -- Reconnect as HEMANT
SQL> connect hemant/hemant
Connected.
SQL>
SQL> -- setup the Invoices table and PK index and sequence
SQL> drop table ASSM_Table;
drop table ASSM_Table
*
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> create table ASSM_Table (
2 invoice_id number not null,
3 invoice_date date,
4 invoice_details varchar2(50),
5 inserting_pid number)
6 tablespace ASSM_TBS
7 /

Table created.

SQL> create unique index ASSM_Table_PK on ASSM_Table(invoice_id) tablespace ASSM_TBS;

Index created.

SQL> alter table ASSM_Table add constraint ASSM_Table_PK primary key (invoice_id);

Table altered.

SQL> drop sequence ASSM_invoice_Seq ;

Sequence dropped.

SQL> create sequence ASSM_invoice_Seq start with 1 increment by 1 nocache ;

Sequence created.

SQL>
SQL>
SQL> drop table MSSM_Table;
drop table MSSM_Table
*
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> create table MSSM_Table (
2 invoice_id number not null,
3 invoice_date date,
4 invoice_details varchar2(50),
5 inserting_pid number)
6 tablespace MSSM_TBS
7 /

Table created.

SQL> create unique index MSSM_Table_PK on MSSM_Table(invoice_id) tablespace MSSM_TBS;

Index created.

SQL> alter table MSSM_Table add constraint MSSM_Table_PK primary key (invoice_id);

Table altered.

SQL> drop sequence MSSM_invoice_Seq ;

Sequence dropped.

SQL> create sequence MSSM_invoice_Seq start with 1 increment by 1 nocache;

Sequence created.

SQL>
SQL>
SQL> -- create procedures that will simulate an application server
SQL> -- or ETL engine that inserts 500 rows
SQL> create or replace procedure populate_ASSM_Table
2 as
3 i number;
4 begin
5 for i in 1..501
6 loop
7 insert into ASSM_Table
8 select ASSM_invoice_seq.nextval, sysdate+rownum/10000, rpad('A',40,'X'), pid
9 from
10 v$process p, v$session s
11 where p.addr=s.paddr
12 and s.sid=(select sid from v$mystat where rownum=1)
13 ;
14 end loop;
15 commit;
16 end;
17 /

Procedure created.

SQL>
SQL> create or replace procedure populate_MSSM_Table
2 as
3 i number;
4 begin
5 for i in 1..501
6 loop
7 insert into MSSM_Table
8 select MSSM_invoice_seq.nextval, sysdate+rownum/10000, rpad('A',40,'X'), pid
9 from
10 v$process p, v$session s
11 where p.addr=s.paddr
12 and s.sid=(select sid from v$mystat where rownum=1)
13 ;
14 end loop;
15 commit;
16 end;
17 /

Procedure created.

SQL>
SQL>
SQL> REM now run the procedures from three separate sessions
SQL> pause .............. wait for the procedures to be executed
.............. wait for the procedures to be executed

SQL>
SQL>
SQL> -- gather statistics on the tables and indexes
SQL> exec dbms_Stats.gather_table_stats('','ASSM_TABLE',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> exec dbms_Stats.gather_table_stats('','MSSM_TABLE',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> -- view statistics on the tables and indexes
SQL> select table_name, num_rows, blocks
2 from user_tables
3 where table_name in ('ASSM_TABLE','MSSM_TABLE')
4 order by 1
5 /

TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
ASSM_TABLE 1503 13
MSSM_TABLE 1503 15

SQL>
SQL> select index_name, num_rows, blevel, leaf_blocks, clustering_factor
2 from user_indexes
3 where index_name in ('ASSM_TABLE_PK','MSSM_TABLE_PK')
4 order by 1
5 /

INDEX_NAME NUM_ROWS BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- -----------------
ASSM_TABLE_PK 1503 1 4 302
MSSM_TABLE_PK 1503 1 3 17

SQL>
SQL>
SQL> -- view the number of distinct blocks
SQL> select inserting_pid,count(distinct(dbms_rowid.rowid_block_number(rowid)))
2 from ASSM_TABLE
3 group by inserting_pid
4 order by 1
5 /

INSERTING_PID COUNT(DISTINCT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)))
------------- -----------------------------------------------------
23 6
29 7
30 5

SQL>
SQL> select inserting_pid,count(distinct(dbms_rowid.rowid_block_number(rowid)))
2 from MSSM_TABLE
3 group by inserting_pid
4 order by 1
5 /

INSERTING_PID COUNT(DISTINCT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)))
------------- -----------------------------------------------------
23 7
29 6
30 8

SQL>

Notice how the number of blocks for the table and for each inserting process is generally the same for the ASSM table and the MSSM table. But the CLUSTERING_FACTOR is now significantly higher in the ASSM table.
In MSSM, each new row (higher sequence value) almost always went in the same block as the previous row, except when the block was full. Thus a table with 15 blocks has a CLUSTERING_FACTOR of only 17. These rows are very well clustered.
In ASSM, on the other hand, the CLUSTERING_FACTOR is 302 ! Much higher spread across blocks. Given a row for INVOICE_ID (i.e. the sequence-based value) 'n', the likelihood of the row for INVOICE_ID 'n+1' being in the same block is much lower in ASSM than in MSSM. An Index Range Scan will be more "expensive" in the ASSM Table.

.
.
.

No comments: