THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Stats overflow

Stats overflow in Oracle is certainly something you should keep an eye out for, however, sometimes an overflow comes too early (and too unexpectedly).

Recently I’ve been puzzled with the performance difference observed between 11GR1 and 11GR2. As part of the investigation I’ve decided to compare session stats from both 11GR1 and 11GR2 and found that there is a big difference for session cursor cache hits. But the number of session cursor cache hits I’ve been observing in 11GR2 just didn’t made any sense while being absolutely fine in 11GR1. I’ve been running through a relatively low number of iterations and every time it kept coming back with some nonsensical number.

An overflow was certainly a possibility, though in 2010, it’s not like someone should be counting every bit especially if they didn’t in 11GR1! Anyway it was worth a shot so I’ve decided to check it out…

SQL> select statistic# from v$statname where name = 'session cursor cache hits';

STATISTIC#
----------
       498

SQL> create table z_t as select 1 n from dual;

Table created

SQL> set serveroutput on
SQL> declare
  2  	l_n1	number:=0;
  3  	l_n2	number:=0;
  4  begin
  5  	loop
  6  		for cur in (select n from z_t)
  7  		loop
  8  			null;
  9  		end loop;
 10
 11  		select value into l_n2 from v$mystat where statistic#=498;
 12
 13  		exit when l_n2 < l_n1;
 14
 15  		l_n1 := l_n2;
 16  	end loop;
 17
 18  	dbms_output.put_line(l_n1);
 19  	dbms_output.put_line(l_n2);
 20  end;
 21  /

65535
1

PL/SQL procedure successfully completed

Oh well… apparently session cursor cache hits is now backed up by 16-bit unsigned integer… uncool.

I don’t know whether any other stat experienced such a “downgrade” but for my particular case having it as 16-bit unsigned integer produced quite a bizarre results on the newest version of Oracle database where the previous one was behaving just fine.

Tests were done on Oracle 11.2.0.1 under Linux x64.

Where did the filter come from?

There has been a recent thread on ORACLE-L where the poster asked why there is an extra filter predicate which appears when functions are being used in the where clause.

We can observe the behavior using the following test case:

SQL> create table t
  2  (
  3  	n number,
  4  	dt date
  5  );

Table created

SQL> insert into t values (1, sysdate);

1 row inserted

SQL> commit;

Commit complete

SQL> select * from t where dt >= add_months(trunc(sysdate, 'mm'), -1) and dt < trunc(sysdate, 'mm');

Execution Plan
----------------------------------------------------------
Plan hash value: 1322348184

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    22 |     2   (0)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T    |     1 |    22 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ADD_MONTHS(TRUNC(SYSDATE@!,'fmmm'),(-1))<TRUNC(SYSDATE@!,'
              fmmm'))
   2 - filter("DT"<TRUNC(SYSDATE@!,'fmmm') AND
              "DT">=ADD_MONTHS(TRUNC(SYSDATE@!,'fmmm'),(-1)))

Why there is an extra filter predicate against step# 1 in the plan?

The following predicate:

dt >= add_months(trunc(sysdate, 'mm'), -1) AND dt < trunc(sysdate, 'mm')

can be represented in the simplified form as:

X >= Y AND X < Z

The above predicate is true only when the following predicate is true:

Y < Z

In other words, If Y is greater than Z then the original predicate will always evaluate to false no matter what the value of X is. If we substitue the above with the actual values which we have:

X: DT
Y: add_months(trunc(sysdate, 'mm'), -1)
Z: trunc(sysdate, 'mm')
Y < Z: add_months(trunc(sysdate, 'mm'), -1) < trunc(sysdate, 'mm')

In other words, if the filter condition in step# 1 evaluates to false then there is no need to execute the query at all because no rows will come out regardless of what data we have in the dt column and Oracle can use it to short-circuit the execution. The reason that the optimizer have to execute that filter explicitly is because sysdate will look to it as any other non-deterministic function, i.e. you don’t know whether Y < Z or not until you substitute Y and Z with the actual values which will happen every time you execute the query.

Deferrable constraints in Oracle 11gR2 may lead to logically corrupted data

I’ve hit a bug in Oracle 11.2.0.1 when working with deferrable constraints which I think is worth sharing as it may have profound consequences under certain scenarios.

Let’s start by creating a simple table with a deferrable primary key:

SQL> create table def_bug(n number primary key deferrable initially deferred);

Table created

SQL> insert into def_bug values (1);

1 row inserted

SQL> insert into def_bug values (2);

1 row inserted

SQL> commit;

Commit complete

You can confirm that the primary key constraint is working fine by trying to insert a duplicate value:

SQL> insert into def_bug values (1);

1 row inserted

SQL> commit;

commit

ORA-02091: transaction rolled back
ORA-00001: unique constraint (SRC.SYS_C004070) violated

So far so good. Open a second session and execute the following update:

SQL> update def_bug set n=3 where n=2;

1 row updated

Do not commit yet and execute in your first session:

SQL> update def_bug set n=3 where n<=2;

The above update will block due to our second session holding a lock on the row where n=2. Now commit your second session…

SQL> update def_bug set n=3 where n=2;

1 row updated

SQL> commit;

Commit complete

…and then commit your first session:

SQL> update def_bug set n=3 where n<=2;

1 row updated

SQL> commit;

Commit complete

Take a look at the data now:

SQL> select * from def_bug;

         N
----------
         3
         3

Ouch! This was certainly unexpected. You can confirm that the primary key is still working by trying to insert a duplicate value again:

SQL> insert into def_bug values (3);

1 row inserted

SQL> commit;

commit

ORA-02091: transaction rolled back
ORA-00001: unique constraint (SRC.SYS_C004070) violated

It certainly looks like the update statement did not take into account deferrable constraint declared on the table during restart caused by the write consistency mechanism.

Oracle GoldenGate Extract Internals, Part III

This is the third post in Oracle GoldenGate Extract Internals series (links to part I and part II).

In this post, we’re going to take a closer look at various queries that the Extract process uses against the database. As before, we will start by examining the strace output:

nanosleep({1, 0}, NULL)                 = 0
...
read(20, "\1\"\0\0\255\1\0\0\217\0\0\0H\200\366\256\5\24\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024000) = 1024000
...
write(16, "\0$\0\0\6\0\0\0\0\0\3N'\7\0\0\0\2\0\0\0`\0\0\0\0\0\0\0\7\0011"..., 36) = 36
read(17, "\0\351\0\0\6\0\0\0\0\0\6\1\"\375\2\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 233
write(16, "\0 \0\0\6\0\0\0\0\0\3N(\10\0\0\0\2\0\0\0`\0\0\0\0\0\0\0\7\0011", 32) = 32
read(17, "\0\343\0\0\6\0\0\0\0\0\6\1\"\7\1\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 227
write(16, "\0K\0\0\6\0\0\0\0\0\3N)\t\0\0\0\2\0\0\0`\0\0\0\0\0\0\0\7,/"..., 75) = 75
read(17, "\0\341\0\0\6\0\0\0\0\0\6\1\"\375\1\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 225
write(16, "\0Q\0\0\6\0\0\0\0\0\3N*\n\0\0\0\2\0\0\0`\0\0\0\0\0\0\0\7,/"..., 81) = 81
read(17, "\0\254\0\0\6\0\0\0\0\0\4\1\0\0\0)\0\1\0\0\0\0{\5\0\0\0\0\n\0\0\0"..., 8208) = 172
lseek(20, 227328, SEEK_SET)             = 227328
nanosleep({1, 0}, NULL)                 = 0

File descriptors 16 and 17 are the pipes for one of the bequeath connections we have with the database. There are four queries being submitted each cycle (following the same order as those being sent to a database): Read the rest of this entry . . .

Oracle GoldenGate Extract Internals, Part II

Today we continue looking at various aspects of how the Oracle GoldenGate extract process works.

One of the follow up questions to part I was about the way the Extract process reads from ASM storage. I’ve provided the answer, however, today we’re going get a detailed look at how the Extract process interacts with an ASM instance and what kind of implications may result.

Read the rest of this entry . . .

Oracle GoldenGate Extract Internals, Part I

Since GoldenGate has been declared as a strategic direction for replication technology by Oracle, it sounds like it’s time to get up to speed with various aspects of how this technology works and performs.

As many of you are probably aware, up until recently, GoldenGate had been a third-party product. Technology-wise this presents an interesting challenge for the GoldenGate development team as they have to rely on whatever Oracle makes available to the outside world. Let’s see what kind of techniques they were able to utilize in order to achieve their goals.

I did a simple replication setup between two different databases with the Extract, DataPump and Replicat processes. I’m planning to take a look at all of these but today is the Extract’s process turn.

Oracle GoldenGate Extract Process
The main duty of the Extract process is to read and process Oracle redo logs in order to extract relevant changes and write these to a trail.

Reading from the Redo Logs
This is probably the most interesting aspect of the Extract process as this is where various technology stacks are being bridged together. Unless you configure Extract process to read strictly from the archived logs, it will try to read from online redo logs whenever possible. Let’s take a look at an strace of a running Extract process:

Read the rest of this entry . . .

Oracle Streams Apply Process changes in 11GR2

A couple of weeks ago Christo Kutrovsky mentioned to me about Oracle Streams presentation he saw on this year’s UKOUG. The presentation was from CERN’s Eva Dafonte Pérez and, among over things, Eva mentions about substantial performance enhancements observed in 11GR2.

It is somewhat timely that we’ve been doing some Oracle Golden Gate testing which in turn made me curious to take a closer look at Oracle Streams in 11GR2 and see where all the performance is coming from.

I’ve setup a simple replication for table t1 from schema src to schema dst, changed Apply Server parallelism to 1 and did a simple test with inserting 100 rows while performing a sql trace:
Read the rest of this entry . . .

Stabilize Oracle Bind Peeking Behaviour with Range-Based Predicates

In my previous post, I described the most common cause for unstable plans due to bind peeking — histograms. It is now time to move forward and take a look at another case, namely range-based predicates. Strictly speaking, the cases I’m going to describe can appear without range-based predicates as well, you just need to remember that a range-based operation doesn’t necessarily imply a range-based predicate.

How Can Range-Based Predicates Cause an Unstable Plan?

Quite easy, take the following example:

SQL> create table t as
  2  	select level n, rpad('x', 200, 'x') v
  3  		from dual
  4  		connect by level <= 100000;

Table created

SQL> create index i_t_n on t (n);

Index created

SQL> exec dbms_stats.gather_table_stats(user, 't');

PL/SQL procedure successfully completed

Now, I’ll query the table using two different conditions:

SQL> set autot traceonly explain
SQL> select * from t where n <= 100;

Execution Plan
-----------------------------------------------------
Plan hash value: 2912310446

----------------------------------------------------
| Id  | Operation                   | Name  | Rows |
----------------------------------------------------
|   0 | SELECT STATEMENT            |       |   95 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   95 |
|*  2 |   INDEX RANGE SCAN          | I_T_N |   95 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N"<=100)

SQL> select * from t where n <= 25000;

Execution Plan
------------------------------------------
Plan hash value: 1601196873

------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      | 24998 |
|*  1 |  TABLE ACCESS FULL| T    | 24998 |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N"<=25000)

(Note that I’ve trimmed the output for the sake of readability.)

The first query was executed using INDEX RANGE SCAN, the second one using TABLE ACCESS FULL. This makes perfect sense, since the first query is going to return only a small fraction of data in a table, while the second one is going to fetch substantially more data. As you probably already guessed, if you substitute a literal value for a bind variable, your plan will depend on what value was passed during a hard parse: Read the rest of this entry . . .

Oracle 11G Result Cache in the Real World

As some of you probably already noticed, there was a thread on AskTom discussing the scalability tests I did back in 2007. You are welcome to read the entire thread, but in a nutshell, Tom Kyte claimed that my tests did not reflect how one would use the result cache in the real world.

What is “real world?”

Of course, the important question is whether I tested a feature in a way it was never designed to be used, or whether someone is just trying to make an excuse for poor scalability results by defining “real world” in a way that makes my tests inappropriate.

A new feature

What do you do, then, you first see a new feature? You read about it in the documentation, and then you test it in order to compare what you have read with what you have in reality.

What the documentation tells us

Open the Performance Tuning Guide and go to 7.3.1.4 Result Cache Concepts:

When these queries and functions are executed repeatedly, the results are retrieved directly from the cache memory. This results in a faster response time. The cached results stored become invalid when data in the dependent database objects is modified. The use of the result cache is a database-wide decision.

All it says is that you have to have repeatedly-executed functions and queries to get faster response time. It says nothing about what kind of queries or functions. It also suggests that the result cache should be used database-wide or shouldn’t be used at all (which is perfectly sound according to Jonathan Lewis’s Rules for Hinting).

Now skip up to 7.3.2.7 Use of Result Cache:

OLTP applications can benefit significantly from the use of the result cache. The benefits highly depend on the application. Consider the use of the PL/SQL function result cache and the SQL query result cache when evaluating whether your application can benefit from the result cache.

It clearly says that result cache is perfectly appropriate for OLTP applications. They leave a backdoor with the words, “depend on the application” but, yet again, they say nothing about what kind of OLTP applications.

Read the rest of this entry . . .

Stabilize Oracle 10G’s Bind Peeking Behaviour by Cutting Histograms

I wrote this post because I feel there is a great need for it. The number of people struggling with unstable query plans due to bind peeking in Oracle 10G is enormous, to say the least. More than that, solutions like disabling bind variable peeking are driving us away from understanding the root cause of the problem and applying the right fix to it.

What are the causes of unstable plans due to bind variable peeking?

There are three things that might put you at risk of unstable plans due to bind variable peeking. Those are histograms, partitions, and range-based predicates. I’ll cover last two in upcoming blog posts.

Histograms

Let me share with you my déja  vu. When I see this:

SQL> select value
  2  	from v$parameter
  3  	where name='_optim_peek_user_binds';

VALUE
--------------------
FALSE

I immediately expect this:

SQL> select	sum(case when max_cnt > 2 then 1 else 0 end) histograms,
  2  		sum(case when max_cnt <= 2 then 1 else 0 end) no_histograms
  3  from (
  4  	select table_name, max(cnt) max_cnt
  5  		from (
  6  			select table_name, column_name, count(*) cnt
  7  				from dba_tab_histograms
  8  				group by table_name, column_name
  9  		) group by table_name
 10  );

HISTOGRAMS NO_HISTOGRAMS
---------- -------------
      1169          2494

The above is an example from a real-world OLTP system running with bind peeking disabled. It is no surprise to me. An exception, you say? Here’s another one . . . Read the rest of this entry . . .

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: @ghemant @pythian love your #hemantgiri
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more



Social links powered by Ecreative Internet Marketing