THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Oracle 11g: Multi-Column Correlation Without Extended Stats

Today I’ve been trying to reproduce in an 11g database one of the problems I faced with 10g—one on those problems Riyaj described in his Multi-Column Correlation and Extended Stats in Oracle 11g post. And the fun part is that I wasn’t able to reproduce it. Yet just setting optimizer_features_enable='10.2.0.4' made it show up again. It was as though Oracle can detect Multi-Column Correlation without Extended Statistics. How is this possible?

You don’t need any complicated schemas to check this out for yourself; just one table as below:

create table x (
         a number,
         b number,
         c number);

begin
  for i in 1..1000 loop
    for j in 1..10 loop
      insert into x values (j,j,j);
    end loop;
  end loop;
end;
/

commit;

exec dbms_stats.gather_table_stats(-
                 user,-
                 'X');

Execute a query with a WHERE clause containing two correlated columns:

explain plan for select c from x where a=1 and b=1;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |   100 |   900 |     7   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| X	 |   100 |   900 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------

The optimizer estimates the number of rows returned as 100, which is equivalent to considering that each one of the conditions in the WHERE clause returns 1000 rows, and that those columns are not correlated. (In that case that’s obviously wrong, because a equals b). So, no change with 10g? Read the rest of this entry . . .

Multi-Column Correlation and Extended Stats in Oracle 11g

We all have encountered this situation many times before: the cost-based optimizer assumes no correlation between two columns (until 11g), and this has the effect of erroneously reducing the cardinality of a row source. Incorrect cardinality estimates are one of many root causes for SQL performance issues. Consider the following example.

This code creates a table and populates data with four columns:

create table t_vc as
select mod(n, 100) n1, mod(n, 100) n2 ,
mod(n, 50) n3 , mod(n, 20) n4
from (select level n from dual connect by level <= 10001);

The first two columns, n1 and n2, have a strong correlation: n1 is always equals to n2.

Let’s collect statistics with histograms on all columns.

begin
dbms_stats.gather_Table_stats( user, 'T_VC', estimate_percent => null, method_opt => 'for all columns size 254');
end;
/

Let’s explain the plan for the query, specifying one predicate, exactly 100 rows with a value of 10.

Read the rest of this entry . . .

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @sheeri: #confoo talk "Bending Queries to your Will with EXPLAIN" slides http://bit.ly/explainslides & handout
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more