Lets take a look upon Join Cardinality which happens when we join two (or more) Table together to get desired result. You might be aware that Cardnilaity for Single Table
for access/filter predicate(no hitograms) comes from Selectivity*num rows. Idea would be same for Join cardinalities as well but there is sight different formula for
Join cardinality.
Formula Looks like:
Join cardinality = Join selectivity * Filtered cardinality of t1.filter * Filtered cardinality of t2.filter
Join selectivity = (numrows(t1) - null(t1.c1)) / numrows(t1) * (numrows(t2) - null(t2.c1)) / numrows(t2) / greater distnct(t1.c1,t2.c1)
Here T1 and T2 are two tables been joined on C1 columns in both tables repectivity.
T1.Filer is Filter aplied on T1 table (if any)
T2.Filer is Filter aplied on T2 table (if any)
Lets take a simple test to check the above formula
create table TAB2 as
select trunc(dbms_random.value(0,5000)) jn,
trunc(dbms_random.value(0, 40 )) v1,
rpad('x',100) padding
from all_objects where rownum <= 10000;
create table TAB1 as
select trunc(dbms_random.value(0,4)) jn,
trunc(dbms_random.value(0, 40 )) v1,
rpad('x',100) padding
from all_objects where rownum <= 10000;
Gathering stats(witout histograms)
execute dbms_stats.gather_table_stats(ownname=>user,tabname=>'TAB2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>user,tabname=>'TAB1',method_opt=>'for all columns size 1');
Below are the statistics of both table and its columns(ony required columns)
TABLE_NAME NUM_DISTINCT DENSITY COLUMN_NAME
------------------------------ ------------ ---------- ------------------------------
TAB1 4 .25 JN
TAB1 40 .025 V1
TAB2 4329 .000231 JN
TAB2 40 .025 V1
SQL> select count(*) from tab1, tab2 where tab1.jn=tab2.jn and tab1.v1=1;
COUNT(*)
----------
420
Execution Plan
----------------------------------------------------------
Plan hash value: 2043035240
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 89 (2)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 578 | 5780 | 89 (2)| 00:00:01 |
|* 3 | TABLE ACCESS STORAGE FULL| TAB1 | 250 | 1500 | 44 (0)| 00:00:01 |
| 4 | TABLE ACCESS STORAGE FULL| TAB2 | 10000 | 40000 | 44 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TAB1"."JN"="TAB2"."JN")
3 - storage("TAB1"."V1"=1)
filter("TAB1"."V1"=1)
Statistics
----------------------------------------------------------
18 recursive calls
0 db block gets
348 consistent gets
0 physical reads
0 redo size
527 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed
How come optimizer has assumed the Join selectvity at Line 2 of Explain plan as 578?????
Lets check
Join selectivity = ((10,000 - 0)/10,000 * (10,000 - 0)/10,000) / 4329 --Since 4329 is greater then 4
= 1/4329
Join Card = 1/4329*10,000*250 = 577.5 = Rounded to 578 - As expected
So formula seems to work in this condition for 11.2 version of oracle
Lets take a different set of example:
drop table tab1;
drop table tab2;
create table tab1 as
select
trunc(dbms_random.value(0, 500 )) filter,
trunc(dbms_random.value(0, 50 )) jn,
lpad(rownum,10) v1, rpad('x',100) padding
from
all_objects
where
rownum <= 1000;
create table tab2 as
select
trunc(dbms_random.value(0, 500 )) filter,
trunc(dbms_random.value(0, 60 )) jn,
lpad(rownum,10) v1, rpad('x',100) padding
from
all_objects
where
rownum <= 1000;
execute dbms_stats.gather_table_stats(ownname=>user,tabname=>'TAB2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>user,tabname=>'TAB1',method_opt=>'for all columns size 1');
SQL> select table_name,num_distinct, density, column_name from dba_tab_columns where table_name in('TAB1','TAB2') and column_name in ('JN','FILTER');
TABLE_NAME NUM_DISTINCT DENSITY COLUMN_NAME
------------------------------ ------------ ---------- ------------------------------
TAB1 438 .002283105 FILTER
TAB1 50 .02 JN
TAB2 441 .002267574 FILTER
TAB2 60 .016666667 JN
set autotrace on;
select t1.v1, t2.v1 from tab1 t1,tab2 t2 where t2.jn = t1.jn and t2.filter = 1;
28 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2953944933
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 45 | 1440 | 15 (7)| 00:00:01 |
|* 1 | HASH JOIN | | 45 | 1440 | 15 (7)| 00:00:01 |
|* 2 | TABLE ACCESS STORAGE FULL| TAB2 | 2 | 36 | 7 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL| TAB1 | 1000 | 14000 | 7 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JN"="T1"."JN")
2 - storage("T2"."FILTER"=1)
filter("T2"."FILTER"=1)
Explain plan says Cardinality of 45??????
Lets confirm
Join Selectivity = 1/50 ( Since there are no nulls on both columns and reason i took 50 on other column despite its smaller then 60 is because Table TAB2 has already fliter on it)
Join Card = 1/50 * 1000 * 2.2 = 44 - (it took 1/50 from tab1 column)
This seems to work too!
References:
J.Lewis Cost-Based Fundamentals
Support.oracle.com
No comments:
Post a Comment