<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Pythian Blog &#187; Alex Fatkulin</title>
	<atom:link href="http://www.pythian.com/news/author/alexf/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pythian.com/news</link>
	<description>News and views from Pythian DBAs</description>
	<lastBuildDate>Mon, 15 Mar 2010 21:40:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Deferrable constraints in Oracle 11gR2 may lead to logically corrupted data</title>
		<link>http://www.pythian.com/news/9881/deferrable-constraints-in-oracle-11gr2-may-lead-to-logically-corrupted-data/</link>
		<comments>http://www.pythian.com/news/9881/deferrable-constraints-in-oracle-11gr2-may-lead-to-logically-corrupted-data/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 21:40:17 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Technical Blog]]></category>
		<category><![CDATA[11g]]></category>
		<category><![CDATA[Oracle 11g]]></category>

		<guid isPermaLink="false">http://www.pythian.com/news/?p=9881</guid>
		<description><![CDATA[I&#8217;ve hit a bug in Oracle 11.2.0.1 when working with deferrable constraints which I think is worth sharing as it may have profound consequences under certain scenarios.
Let&#8217;s start by creating a simple table with a deferrable primary key:
SQL&#62; create table def_bug(n number primary key deferrable initially deferred);

Table created

SQL&#62; insert into def_bug values (1);

1 row inserted

SQL&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve hit a bug in Oracle 11.2.0.1 when working with deferrable constraints which I think is worth sharing as it may have profound consequences under certain scenarios.</p>
<p>Let&#8217;s start by creating a simple table with a deferrable primary key:</p>
<pre class="brush: sql;">SQL&gt; create table def_bug(n number primary key deferrable initially deferred);

Table created

SQL&gt; insert into def_bug values (1);

1 row inserted

SQL&gt; insert into def_bug values (2);

1 row inserted

SQL&gt; commit;

Commit complete</pre>
<p>You can confirm that the primary key constraint is working fine by trying to insert a duplicate value:</p>
<pre class="brush: sql;">SQL&gt; insert into def_bug values (1);

1 row inserted

SQL&gt; commit;

commit

ORA-02091: transaction rolled back
ORA-00001: unique constraint (SRC.SYS_C004070) violated</pre>
<p>So far so good. Open a second session and execute the following update:</p>
<pre class="brush: sql;">SQL&gt; update def_bug set n=3 where n=2;

1 row updated</pre>
<p>Do not commit yet and execute in your first session:</p>
<pre class="brush: sql;">SQL&gt; update def_bug set n=3 where n&lt;=2;</pre>
<p>The above update will block due to our second session holding a lock on the row where <em>n=2</em>. Now commit your second session&#8230;</p>
<pre class="brush: sql; highlight: [5];">SQL&gt; update def_bug set n=3 where n=2;

1 row updated

SQL&gt; commit;

Commit complete</pre>
<p> &#8230;and then commit your first session:</p>
<pre class="brush: sql; highlight: [5];">SQL&gt; update def_bug set n=3 where n&lt;=2;

1 row updated

SQL&gt; commit;

Commit complete</pre>
<p>Take a look at the data now:</p>
<pre class="brush: sql;">SQL&gt; select * from def_bug;

         N
----------
         3
         3</pre>
<p>Ouch! This was certainly unexpected. You can confirm that the primary key is still working by trying to insert a duplicate value again:</p>
<pre class="brush: sql;">SQL&gt; insert into def_bug values (3);

1 row inserted

SQL&gt; commit;

commit

ORA-02091: transaction rolled back
ORA-00001: unique constraint (SRC.SYS_C004070) violated</pre>
<p>It certainly looks like the update statement did not take into account deferrable constraint declared on the table during restart caused by the write consistency mechanism.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/9881/deferrable-constraints-in-oracle-11gr2-may-lead-to-logically-corrupted-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle GoldenGate Extract Internals, Part III</title>
		<link>http://www.pythian.com/news/7617/oracle-goldengate-extract-internals-part-iii/</link>
		<comments>http://www.pythian.com/news/7617/oracle-goldengate-extract-internals-part-iii/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 22:27:32 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Technical Blog]]></category>
		<category><![CDATA[GoldenGate]]></category>
		<category><![CDATA[internals]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[tracing]]></category>

		<guid isPermaLink="false">http://www.pythian.com/news/?p=7617</guid>
		<description><![CDATA[This is the third post in Oracle GoldenGate Extract Internals series (links to part I and part II).
In this post, we&#8217;re going to take a closer look at various queries that the Extract process uses against the database. As before, we will start by examining the strace output:

nanosleep({1, 0}, NULL)      [...]]]></description>
			<content:encoded><![CDATA[<p>This is the third post in Oracle GoldenGate Extract Internals series (links to <a href="http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i">part I</a> and <a href="http://www.pythian.com/news/7459/oracle-goldengate-extract-internals-part-ii/">part II</a>).</p>
<p>In this post, we&#8217;re going to take a closer look at various queries that the Extract process uses against the database. As before, we will start by examining the <em>strace</em> output:</p>
<pre class="brush: bash; highlight: [5,7,9,11];">
nanosleep({1, 0}, NULL)                 = 0
...
read(20, &quot;\1\&quot;&#92;&#48;&#92;&#48;\255\1&#92;&#48;&#92;&#48;\217&#92;&#48;&#92;&#48;&#92;&#48;H\200\366\256\5\24&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 1024000) = 1024000
...
write(16, &quot;&#92;&#48;$&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3N'\7&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;`&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7&#92;&#48;011&quot;..., 36) = 36
read(17, &quot;&#92;&#48;\351&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\6\1\&quot;\375\2&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 233
write(16, &quot;&#92;&#48; &#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3N(\10&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;`&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7&#92;&#48;011&quot;, 32) = 32
read(17, &quot;&#92;&#48;\343&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\6\1\&quot;\7\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 227
write(16, &quot;&#92;&#48;K&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3N)\t&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;`&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7,/&quot;..., 75) = 75
read(17, &quot;&#92;&#48;\341&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\6\1\&quot;\375\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 225
write(16, &quot;&#92;&#48;Q&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3N*\n&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;`&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7,/&quot;..., 81) = 81
read(17, &quot;&#92;&#48;\254&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\4\1&#92;&#48;&#92;&#48;&#92;&#48;)&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;{\5&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\n&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 172
lseek(20, 227328, SEEK_SET)             = 227328
nanosleep({1, 0}, NULL)                 = 0
</pre>
<p>File descriptors 16 and 17 are the pipes for one of the bequeath connections we have with the database. There are four queries being submitted each cycle (following the same order as those being sent to a database): <span id="more-7617"></span></p>
<pre class="brush: sql;">
SELECT DECODE(archived, 'YES', 1, 0), status  FROM v$log WHERE thread# = :ora_thread AND sequence# = :ora_seq_no

SELECT MAX(sequence#)  FROM v$log WHERE thread# = :ora_thread AND status in ('INVALIDATED', 'CURRENT', 'ACTIVE')

SELECT DECODE(status, 'STALE', 1, 0) FROM v$logfile WHERE member = :log_name

SELECT 1  FROM V$LOGFILE WHERE(STATUS NOT IN ('STALE', 'INVALID') OR STATUS IS NULL) AND MEMBER &lt;&gt; :log_name AND EXISTS ( SELECT 1 FROM V$LOG WHERE GROUP#  = V$LOGFILE.GROUP# AND THREAD# = :ora_thread AND SEQUENCE# = :ora_seq_no ) AND ROWNUM = 1
</pre>
<p>The purpose of these statements is to constantly keep an eye on what&#8217;s happening inside the database by regularly polling the contents of the above views. What&#8217;s worth mentioning about the above queries is that all of them will cause extra I/O to the controlfile. On my test database, that equaled 640KB each cycle (40 I/O requests, 16KB each). In most cases, this is nothing to worry about&#8211;just keep the additional I/O in mind in case your controfile is already a hot spot.</p>
<p>The redo log stores object identifiers (a number), which means that when  the Extract process encounters a supported operation, it needs a way to find out more details. This is achieved by a couple of statements against the data dictionary. The following statement will be issued first:</p>
<pre class="brush: sql; gutter: false;">
SELECT u.name, o.name, o.dataobj#, o.type#, (SELECT bitand(t.property, 1) FROM sys.tab$ t WHERE t.obj# = :ora_object_id) FROM sys.obj$ o, sys.user$ u WHERE o.obj# = :ora_object_id  AND decode(bitand(o.flags, 128), 128, 'YES', 'NO') = 'NO'  AND o.owner# = u.user# AND decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N') = 'N' AND (o.type# in (1, 19, 20, 34) OR EXISTS (SELECT 'x' FROM sys.tab$ t WHERE t.obj# = :ora_object_id))
</pre>
<p>In case the object turns out to be a table, it will be checked whether it is an overflow segment for an IOT:</p>
<pre class="brush: sql; gutter: false;">
SELECT nvl(iot_name, 'NULL')   FROM all_tables WHERE owner = :owner AND table_name = :object_name
</pre>
<p>This allows the Extract process to figure out whether it needs to process changes, in case the overflow segment belongs to an IOT from which we&#8217;re capturing the data. In case the object in question turns out to be an index, a corresponding check will be made to see whether it&#8217;s an underlying index for an IOT:</p>
<pre class="brush: sql; gutter: false;">
SELECT table_owner, table_name FROM all_indexes WHERE index_name = :object_name AND         owner = :owner AND index_type = 'IOT - TOP'
</pre>
<p>This is required so that the changes made to an IOT can be captured, in case it belongs to an interested tables list.</p>
<p>The above queries will be executed regardless of whether or not you&#8217;re interested in capturing changes from the particular object,  because the queries are required <em>before</em> you can make that decision. In case this is something we&#8217;re interested in, additional information will be requested:</p>
<pre class="brush: sql; gutter: false;">
select object_type, object_name, subobject_name from dba_objects where object_id = :ora_object_id
</pre>
<p>The above statement is necessarily in case we&#8217;re dealing with the partitioned object and, depending on the result, one of the following two statements will be executed:</p>
<pre class="brush: sql;">select ts.bigfile from dba_tablespaces ts,  all_tables t  where t.table_name = :ora_object_name and  t.tablespace_name = ts.tablespace_name and rownum = 1

select t.bigfile from dba_tablespaces t,  all_tab_partitions p  where p.partition_name = :ora_subobject_name and   p.tablespace_name = t.tablespace_name and rownum=1
</pre>
<p>There is obviously an issue with the above two statements.  Neither of them specify the object owner and, in case you have two (or more) objects with the same name but in different schemas, the above statements may return incorrect information, if these objects are located in different tablespace types. </p>
<p>It&#8217;s interesting how the issue is shoved away using <code>rownum = 1</code> condition. What could the potential impact be? One thought that immediately comes to mind is that the way <code>ROWIDs</code> are organized is different between small- and big-file tablespaces (the part being used for a relative data file number in a small file tablespace is used for a block number in case of a big file tablespace), so some functionality that potentially relies on that could be affected. I&#8217;ve got a couple of ideas, but I&#8217;ll hold these until I do some testing.</p>
<p>What else is interesting? The big thing is that none of the information on columns is being resolved. All I&#8217;m going to say right now is that column information will be fetched by the Replicat process using the destination system&#8217;s data dictionary. Combine this with the fact that an online data dictionary does not store historical information about an object&#8217;s metadata, and you have a perfect recipe to get yourself into various nasty situations (which is exactly the reason why Oracle Streams relies on MVDD instead of the online data dictionary). But more on that when we get to the Replicat process internals series.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/7617/oracle-goldengate-extract-internals-part-iii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle GoldenGate Extract Internals, Part II</title>
		<link>http://www.pythian.com/news/7459/oracle-goldengate-extract-internals-part-ii/</link>
		<comments>http://www.pythian.com/news/7459/oracle-goldengate-extract-internals-part-ii/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 21:01:55 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Group Blog Posts]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Technical Blog]]></category>
		<category><![CDATA[GoldenGate]]></category>
		<category><![CDATA[internals]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[tracing]]></category>

		<guid isPermaLink="false">http://www.pythian.com/news/?p=7459</guid>
		<description><![CDATA[Today we continue looking at various aspects of how the Oracle GoldenGate extract process works.
One of the follow up questions to part I was about the way the Extract process reads from ASM storage. I&#8217;ve provided the answer, however, today we&#8217;re going get a detailed look at how the Extract process interacts with an ASM [...]]]></description>
			<content:encoded><![CDATA[<p>Today we continue looking at various aspects of how the Oracle GoldenGate extract process works.</p>
<p>One of the follow up questions to <a href="http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i">part I</a> was about the way the Extract process reads from ASM storage. I&#8217;ve provided the answer, however, today we&#8217;re going get a detailed look at how the Extract process interacts with an ASM instance and what kind of implications may result.</p>
<p><span id="more-7459"></span></p>
<p>Let&#8217;s take a look at what changes in the Extract process loop when it reads from ASM storage:</p>
<pre class="brush: bash;">nanosleep({1, 0}, NULL)                 = 0
...
write(18, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3^\v)\4\4&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;\376\377\377\377\377\377\377\377C&#92;&#48;&#92;&#48;&quot;..., 8155) = 8155
write(18, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1\1&#92;&#48;Z\7\n&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1\2&#92;&#48;&#92;&#48;W\7\n&#92;&#48;\237&#92;&#48;&#92;&#48;&quot;..., 8155) = 8155
write(18, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1&#92;&#48;\4&#92;&#48;&#92;&#48;&#92;&#48;\237&#92;&#48;&#92;&#48;\1u\7\n&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1\2\235\336&quot;..., 8155) = 8155
write(18, &quot;\22\367&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;.\1\300&#92;&#48;\204&#92;&#48;\n&#92;&#48;&#92;&#48;\200&#92;&#48;&#92;&#48;\217\7\n&#92;&#48;\237&#92;&#48;&#92;&#48;\1\232&quot;..., 4855) = 4855
read(18, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\v\1\5\276\4&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\260[\211\312&quot;..., 8208) = 8208
read(18, &quot;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8155) = 8155
read(18, &quot;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8155) = 8155
read(18, &quot;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8155) = 4540
...
nanosleep({1, 0}, NULL)                 = 0</pre>
<p>As before,  to make the loop a bit more understandable, I've removed a bunch of <em>syscalls</em>. Let's take a look at the file descriptor 18:</p>
<pre class="brush: bash;">[root@gg1 fd]# ls -l 18
lrwx------ 1 oracle oinstall 64 Jan 24 20:56 18 -&gt; socket:[24439]
</pre>
<p>Let&#8217;s find out more about this socket&#8230;</p>
<pre class="brush: bash;">[root@gg1 fd]# lsof -p 7725 | grep 18u
extract 7725 oracle   18u  IPv6              24439                TCP [::1]:16428-&gt;[::1]:ncube-lm (ESTABLISHED)
[root@gg1 fd]# lsof -i :16428
COMMAND  PID   USER   FD   TYPE DEVICE SIZE NODE NAME
extract 7725 oracle   18u  IPv6  24439       TCP [::1]:16428-&gt;[::1]:ncube-lm (ESTABLISHED)
oracle  7753 oracle   16u  IPv6  24440       TCP [::1]:ncube-lm-&gt;[::1]:16428 (ESTABLISHED)
[root@gg1 fd]# ps -f -p 7753
UID        PID  PPID  C STIME TTY          TIME CMD
oracle    7753     1  0 20:56 ?        00:00:00 oracle+ASM (LOCAL=NO)
</pre>
<p>In place of the <em>read syscalls</em> there is now some sort of communication going on with ASM instance.</p>
<p>Alas, when it comes to tracing, an ASM instance is not the most friendly thing you can work with. But there are other ways. Enabling SQL*Net trace is enough to see what replaced the <em>read syscall</em>:</p>
<pre class="brush: sql;">BEGIN dbms_diskgroup.read(:handle, :offset, :length, :buffer); END;</pre>
<p>This is nothing other than one of the ASM internal packages which can be used to read directly from ASM disk group. How is the <em>:handle</em> being obtained? If we go a little bit up the trace, we can find the answer:</p>
<pre class="brush: sql;">BEGIN dbms_diskgroup.getfileattr('+REDO/gg1/onlinelog/group_2.2547.708989191', :filetype, :filesize, :lblksize); END</pre>
<p>The above code is used to get file attributes which are necessary to call the <em>dbms_diskgroup.open</em> procedure:</p>
<pre class="brush: sql;">BEGIN dbms_diskgroup.open('+REDO/gg1/onlinelog/group_2.257.708989191', 'r', :filetype, :lblksize, :handle, :pblksize, :filesize); END;</pre>
<p>This call will return a <em>:handle</em> which can later be used in <em>dbms_diskgroup.read</em>. The name of the online redo log file to read is being fetched by querying the RDBMS instance:</p>
<pre class="brush: sql;">SELECT member,        DECODE(status, 'CURRENT', 1, 0),        DECODE(archived, 'YES', 1, 0)   FROM (select lf.member,        l.status,        1.archived        from v$logfile lf, v$log l        WHERE lf.group# = l.group# AND        l.thread# = :ora_thread AND        l.sequence# = :ora_seq_no AND        (lf.status NOT IN        ('INVALID','INCOMPLETE','STALE') OR        lf.status is null)        order by lf.member DESC )        where rownum = 1</pre>
<p>Note that this query is a bit different form the one used when fetching the log file name on a file system. Both <em>dbms_diskgroup.getfileattr</em> and <em>dbms_diskgroup.open</em> calls happen when a new log file needs to be accessed (during startup or after a log file switch, for example), i.e. we do not execute these as part of a main loop listed above.</p>
<p>The next thing to find out is what happens in the ASM dedicated process as a result of <em>dbms_diskgroup.read</em> call:</p>
<pre class="brush: bash; highlight: [4];">read(16, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 8155
read(16, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 8208
read(16, &quot;\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\v\1\33\5&#92;&#48;&#92;&#48;\31\2\2\r&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\372&#92;&#48;\300&#92;&#48;M&quot;..., 8155) = 4803
pread(15, &quot;\1\&quot;&#92;&#48;&#92;&#48;\22 &#92;&#48;&#92;&#48;\21&#92;&#48;&#92;&#48;&#92;&#48;H\200\305\365\5\24&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 28672, 136324096) = 28672
write(16, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\v\1\5\326\4&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\240/V\232&quot;..., 8155) = 8155
write(16, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\v\5\1&#92;&#48;\4&#92;&#48;&#92;&#48;&#92;&#48;\237&#92;&#48;&#92;&#48;\1g\251\v&#92;&#48;&#92;&#48;&#92;&#48;b\251\1\2&quot;..., 8155) = 8155
write(16, &quot;\37\333&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;\3&#92;&#48;\4&#92;&#48;\5&#92;&#48;\6&#92;&#48;\7&#92;&#48;\10&#92;&#48;\t&#92;&#48;\n&#92;&#48;\v&#92;&#48;\f&quot;..., 8155) = 8155</pre>
<p>If you take a loot at the <em>lsof</em> output above, the file descriptor 16 is a socket back to the Extract process so these <em>syscalls</em> are related to SQL*Net messages. What we&#8217;re really interested in is a call on line# 4 with a <em>pread syscall</em> from the file descriptor 15:</p>
<pre class="brush: bash;">[root@gg1 fd]# ls -l 15
lrwx------ 1 oracle oinstall 64 Jan 24 20:58 15 -&gt; /dev/sdb1</pre>
<p>Of course, <em>/dev/sdb1</em> is a device under out REDO diskgroup:</p>
<pre class="brush: sql;">SQL&gt; select d.path, g.name
        from v$asm_disk d, v$asm_diskgroup g
        where d.group_number = g.group_number
                and d.path='/dev/sdb1';  2    3    4

PATH            NAME
--------------- ----------------
/dev/sdb1       REDO</pre>
<p>What else can we say? First of all, the read size is significantly smaller, just 28672 bytes, compared to 1000K read size when the log has been located on a cooked file system. If you get such a small read size and the fact that the data needs to go through the network and SQL*Net stacks&#8230; I would say that I expect this to be much less efficient compared to how online redo logs are being read from a file system. One of the immediate things to realize is that, in case you&#8217;re running the Extract process on the same machine as your ASM instance, it probably makes total sense to configure the connection string to ASM instance (the one which is being specified in the Extract process parameters) using <em>bequeath</em> protocol so the traffic can go through a pipe instead of a socket which should provide better performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/7459/oracle-goldengate-extract-internals-part-ii/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Oracle GoldenGate Extract Internals, Part I</title>
		<link>http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i/</link>
		<comments>http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 15:50:23 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Technical Blog]]></category>

		<guid isPermaLink="false">http://www.pythian.com/news/?p=7225</guid>
		<description><![CDATA[Since GoldenGate has been declared as a strategic direction for replication technology by Oracle, it sounds like it's time to get up to speed with various aspects of how this technology works and performs.

As many of you are probably aware, up until recently, GoldenGate had been a third-party product. Technology-wise this presents an interesting challenge for the GoldenGate development team as they have to rely on whatever Oracle makes available to the outside world. Let's see what kind of techniques they were able to utilize in order to achieve their goals.

I did a simple replication setup between two different databases with the Extract, DataPump and Replicat processes. I'm planning to take a look at all of these but today is the Extract's process turn.
]]></description>
			<content:encoded><![CDATA[<p>Since GoldenGate has been declared as a strategic direction for replication technology by Oracle, it sounds like it&#8217;s time to get up to speed with various aspects of how this technology works and performs.</p>
<p>As many of you are probably aware, up until recently, GoldenGate had been a third-party product. Technology-wise this presents an interesting challenge for the GoldenGate development team as they have to rely on whatever Oracle makes available to the outside world. Let&#8217;s see what kind of techniques they were able to utilize in order to achieve their goals.</p>
<p>I did a simple replication setup between two different databases with the Extract, DataPump and Replicat processes. I&#8217;m planning to take a look at all of these but today is the Extract&#8217;s process turn.</p>
<p><b>Oracle GoldenGate Extract Process</b><br />
The main duty of the Extract process is to read and process Oracle redo logs in order to extract relevant changes and write these to a trail.</p>
<p><b>Reading from the Redo Logs</b><br />
This is probably the most interesting aspect of the Extract process as this is where various technology stacks are being bridged together. Unless you configure Extract process to read strictly from the archived logs, it will try to read from online redo logs whenever possible. Let&#8217;s take a look at an <em>strace</em> of a running Extract process:</p>
<p><span id="more-7225"></span></p>
<pre class="brush: bash; highlight: [1,6,11];">lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)                 = 0
...
read(18, &quot;\1\&quot;&#92;&#48;&#92;&#48;J\24&#92;&#48;&#92;&#48;U&#92;&#48;&#92;&#48;&#92;&#48;H\200\301\302&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;\n&#92;&#48;\20&#92;&#48;&quot;..., 1024000) = 1024000
...
lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)                 = 0
...
read(18, &quot;\1\&quot;&#92;&#48;&#92;&#48;J\24&#92;&#48;&#92;&#48;U&#92;&#48;&#92;&#48;&#92;&#48;H\200\301\302&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;\n&#92;&#48;\20&#92;&#48;&quot;..., 1024000) = 1024000
...
lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)</pre>
<p>I&#8217;ve omitted a bunch of <em>syscalls</em> as I&#8217;ll get to these a bit later plus it makes it much easier to see and understand the loop. Judging by the <em>nanosleep syscall</em> you can figure out that this cycle repeats every second. I&#8217;ve left file descriptor 18 in place since it&#8217;s what pointing out to our current redo log file:</p>
<pre class="brush: bash;">[oracle@gg1 fd]$ ls -l 18
lr-x------ 1 oracle oinstall 64 Jan 19 16:18 18 -&gt; /u02/oradata/GG1/onlinelog/o1_mf_1_5oco39l6_.log</pre>
<p>Let&#8217;s take a closer look at what happens inside that loop. The loop begins with a <em>lseek syscall</em> which sets offset for the specified file descriptor. After a one second delay the Extract process performs a 1000K read starting at the offset sat by the <em>lseek</em>. You can see multiple <em>lseek</em> calls to set the offset to exactly the same position, one for each time the loop gets executed. This is necessary since the <em>read syscall</em> advances the current position in a file, hence we need an <em>lseek syscall</em> to get us back to where we&#8217;ve started.</p>
<p>As you&#8217;ve probably already guessed, this is nothing but a constant polling of the redo log file contents to see whether the new data has arrived. You can confirm that 2659328 is the current position in the online redo log by using a query against <em>x$kcccp</em>:</p>
<pre class="brush: sql;">SQL&gt; select cpodr_bno, cpodr_bno*512, (cpodr_bno-1)*512
  2  	from x$kcccp where indx=0;

 CPODR_BNO CPODR_BNO*512 (CPODR_BNO-1)*512
---------- ------------- -----------------
      5195       2659840           2659328</pre>
<p>The Extract process starts reading from (and including) the last written block. When the new data arrives, the Extract process advances the offset appropriately:</p>
<pre class="brush: bash; highlight: [1,6];">lseek(18, 2729984, SEEK_SET)            = 2729984
nanosleep({1, 0}, NULL)                 = 0
...
read(18, &quot;\1\&quot;&#92;&#48;&#92;&#48;\324\24&#92;&#48;&#92;&#48;U&#92;&#48;&#92;&#48;&#92;&#48;\20\200b\227\204&#92;&#48;&#92;&#48;&#92;&#48;\5h&#92;&#48;&#92;&#48;\200\321\10&#92;&#48;\1&#92;&#48;w5&quot;..., 1024000) = 1024000
...
lseek(18, 2731008, SEEK_SET)            = 2731008
nanosleep({1, 0}, NULL)                 = 0</pre>
<p>The above polling continues until the Extract process detects that there is a supported operation in the redo stream which potentially requires capturing. Remember that redo records contain object ids, not object names. When the object identifier is encountered for the first time, the Extract process will issue a series of statements against the data dictionary to find out what it is. These queries are relatively lightweight, however, if you have a large number of objects inside your database then your data dictionary may became bombarded with these queries upon the Extract process start. The corresponding object name and type will be determined. In case of an index it will be checked whether it&#8217;s an IOT in order to resolve the table name properly.</p>
<p><b>Challenges</b><br />
There are a number of interesting scenarios which may occur during online redo log reading. To begin with, you&#8217;re dealing with a file to which <em>lgwr</em> is actively writing. Processes inside an Oracle database use a vast amount of coordination to make sure they do not step on each other&#8217;s toes, however, when you&#8217;re a third-party process operating outside of the database then what&#8217;s left available to you is to constantly observe the situation and react accordingly while making sure you won&#8217;t get in anybody&#8217;s way. For example, it would be nice to ask <em>lgwr</em> to notify us when the new data had been written to the online redo log file, but we can&#8217;t. So we have to resort to a constant polling to see if something new has appeared. This is one of the areas to keep an eye on as GoldenGate integration with Oracle progresses over time.</p>
<p>Oracle redo logs are being written sequentially in the circular fashion which makes the task of reading these a bit less challenging, however, you may still find yourself in various nasty situations when you think about different possibilities. For example, while you&#8217;re reading the next 1000K, the <em>lgwr</em> might be writing to the same area so some (or all) of the data which you read back might be garbage. My guess would be that the Extract process just keeps reading forward until it encounters data that is no longer valid (due to invalid redo record or <em>RBA</em> not from the expected sequence) and then resumes from the last valid record encountered.</p>
<p>Here is something more interesting to think about. What if, while you&#8217;re reading the redo log file, <em>lgwr</em> manages to go the complete round (i.e. switch through all redo groups and get to the one you&#8217;re currently reading from) and overwrite the data you&#8217;re currently working with? This race condition is relatively unlikely to happen in a real life but still a possibility so I became curious to test this out.</p>
<p>I&#8217;ve blocked the Extract process, switched the redo logs around and made sure that the current online redo log gets overwritten with the new data past the current extract point. Here is an excerpt of the relevant data form the <em>strace</em> output for the Extract process:</p>
<pre class="brush: bash;">lseek(18, 3027456, SEEK_SET)            = 3027456
nanosleep({1, 0}, NULL)                 = 0
fsync(19)                               = 0
lseek(15, 0, SEEK_SET)                  = 0
write(15, &quot;H&#92;&#48;&#92;&#48;\10R&#92;&#48;\n&#92;&#48;\2&#92;&#48;\4&#92;&#48;A&#92;&#48;\1&#92;&#48;\1&#92;&#48;C&#92;&#48;\4&#92;&#48;\3&#92;&#48;\1&#92;&#48;S&#92;&#48;\30&#92;&#48;Sf&quot;..., 2048) = 2048
fsync(15)                               = 0
stat(&quot;/etc/localtime&quot;, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
stat(&quot;/etc/localtime&quot;, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
stat(&quot;/etc/localtime&quot;, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
select(8, [7], NULL, NULL, {0, 0})      = 0 (Timeout)
read(18, &quot;\1\&quot;&#92;&#48;&#92;&#48;\31\27&#92;&#48;&#92;&#48;{&#92;&#48;&#92;&#48;&#92;&#48;\340\201\205N****************&quot;..., 1024000) = 1024000
write(16, &quot;&#92;&#48;$&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3N3\7&#92;&#48;&#92;&#48;&#92;&#48;\2&#92;&#48;&#92;&#48;&#92;&#48;`&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7&#92;&#48;011&quot;..., 36) = 36
read(17, &quot;&#92;&#48;\254&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\4\1&#92;&#48;&#92;&#48;&#92;&#48;002\24\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;{\5&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 172
write(4, &quot;2010-01-20 12:49:02.084  Redo th&quot;..., 146) = 146
write(16, &quot;&#92;&#48;.&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\3N4\16&#92;&#48;&#92;&#48;&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;`&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\7&#92;&#48;031&quot;..., 46) = 46
read(17, &quot;\1\2&#92;&#48;&#92;&#48;\6&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\6\1\&quot;&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;\1&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&#92;&#48;&quot;..., 8208) = 258
stat(&quot;/u02/fra/GG1/archivelog/2010_01_20/o1_mf_1_120_5ogjbvbj_.arc&quot;, {st_mode=S_IFREG|0640, st_size=3034112, ...}) = 0
open(&quot;/u02/fra/GG1/archivelog/2010_01_20/o1_mf_1_120_5ogjbvbj_.arc&quot;, O_RDONLY|O_DIRECT) = 20</pre>
<p>Let&#8217;s go through what happened and how this situation has been handled. The loop starts with positioning the file descriptor at where we left off which is 3027456. After a one second delay, a 1000K redo log read follows. Upon startup, the Extract process establishes two connections with a database using whatever <em>userid</em> credentials you specified in the parameter file. File descriptors 16 and 17 are pointing out to one of these sessions (a dedicated Oracle processes in my case) which the Extract can use to query various information about the database. The query on line 12 checks whether the sequence and thread from which we&#8217;re currently reading got archived:</p>
<pre class="brush: sql;">SELECT DECODE(archived, 'YES', 1, 0),       status  FROM v$log WHERE thread# = :ora_thread AND       sequence# = :ora_seq_no</pre>
<p>In our case, that statement will return zero rows since this particular sequence (we&#8217;ve been reading seq# 120 when the wrap has occurred) is no longer there. My guess would be that that is one of the clues the Extract process might be using to deal with the situation.  Another clue is by looking at what was read from the redo log file itself. Remember that the Extract process read includes the last written redo block which can potentially be compared with the same block from a previous read in order to detect if the wrapping has occurred. After realizing that the log has wrapped, the Extract process will write a warning into it&#8217;s report file (line 14):</p>
<pre class="brush: bash;">2010-01-20 12:49:02.084  Redo thread 1: Online log /u02/oradata/GG1/onlinelog/o1_mf_2_5og4zot6_.log on sequence# 120 has missing trailing blocks.</pre>
<p>The next step is to do a fall back and figure out where to pick up. This task is being accomplished by another query sent on line 15:</p>
<pre class="brush: sql;">SELECT  name    FROM gv$archived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = 'YES' AND         deleted = 'NO' AND         name not like '+%'         AND standby_dest = 'NO'</pre>
<p>The bind values are:</p>
<pre class="brush: bash;">BINDS #14:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=96 off=0
  kxsbbbfp=2ba8c340f618  bln=32  avl=03  flg=05
  value=&quot;120&quot;
 Bind#1
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=0 off=32
  kxsbbbfp=2ba8c340f638  bln=32  avl=01  flg=01
  value=&quot;1&quot;
 Bind#2
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=0 off=64
  kxsbbbfp=2ba8c340f658  bln=32  avl=09  flg=01
  value=&quot;708599728&quot;</pre>
<p>In other words, we&#8217;re looking into <em>gv$archived_log</em> to find out where the required sequence went to (note how ASM files will be bypassed by the means of <em>name not like &#8216;+%&#8217;</em> condition). Once the name of the archived log file has been returned, the Extract process will proceed with opening and reading from it. After that, the Extract process will check whether the next sequence is available to read from the online log file:</p>
<pre class="brush: sql;">SELECT lf.member,        DECODE(l.status, 'CURRENT', 1, 0),        DECODE(l.archived, 'YES', 1, 0)   FROM v$logfile lf, v$log l  WHERE lf.group# = l.group# AND        l.thread# = :ora_thread AND        l.sequence# = :ora_seq_no AND        lf.member not like '+%' AND        rownum = 1 AND        (lf.status NOT IN        ('INVALID','INCOMPLETE') OR        lf.status is null)</pre>
<p>If the sequence is available, the online redo log file will be opened, otherwise the query against <em>v$archived_log</em> will be issued to find out the required archived log file to open. As I&#8217;ve already mentioned, the Extract process prefers to read from the online log file whether possible.</p>
<p>It was indeed a pleasure to see that that situation has been handled correctly. The warning emitted into a report file was nice to see as well (now you know one of the situations when this warning may appear). The GoldenGate development team has earned some credit, at least on this particular situation.</p>
<p>Stay tuned for the next part.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/7225/oracle-goldengate-extract-internals-part-i/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Oracle Streams Apply Process changes in 11GR2</title>
		<link>http://www.pythian.com/news/6665/oracle-streams-apply-process-changes-in-11gr2/</link>
		<comments>http://www.pythian.com/news/6665/oracle-streams-apply-process-changes-in-11gr2/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 21:37:30 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Technical Blog]]></category>
		<category><![CDATA[11g]]></category>
		<category><![CDATA[11gR2]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://www.pythian.com/news/?p=6665</guid>
		<description><![CDATA[A couple of weeks ago Christo Kutrovsky mentioned to me about Oracle Streams presentation he saw on this year&#8217;s UKOUG. The presentation was from CERN&#8217;s Eva Dafonte Pérez and, among over things, Eva mentions about substantial performance enhancements observed in 11GR2.
It is somewhat timely that we&#8217;ve been doing some Oracle Golden Gate testing which in [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago <a href="http://www.pythian.com/news/author/kutrovsky">Christo Kutrovsky</a> mentioned to me about Oracle Streams presentation he saw on this year&#8217;s UKOUG. The presentation was from CERN&#8217;s Eva Dafonte Pérez and, among over things, Eva mentions about substantial performance enhancements observed in 11GR2.</p>
<p>It is somewhat timely that we&#8217;ve been doing some <a href="http://www.oracle.com/technology/software/products/goldengate/index.html">Oracle Golden Gate</a> testing which in turn made me curious to take a closer look at Oracle Streams in 11GR2 and see where all the performance is coming from.</p>
<p>I&#8217;ve setup a simple replication for table <em>t1</em> from schema <code>src</code> to schema <code>dst</code>, changed Apply Server parallelism to 1 and did a simple test with inserting 100 rows while performing a sql trace:<br />
<span id="more-6665"></span></p>
<pre class="brush: sql;">SQL&gt;desc src.t1;
Name Type          Nullable Default Comments
---- ------------- -------- ------- --------
N    NUMBER
V    VARCHAR2(100) Y                       

SQL&gt; select count(*) from src.t1;

COUNT(*)
----------
       0

SQL&gt; select count(*) from dst.t1;

COUNT(*)
----------
       0

SQL&gt; select sid from v$streams_apply_server;

     SID
----------
      22

SQL&gt; exec dbms_monitor.session_trace_enable(22, waits =&gt; true);

PL/SQL procedure successfully completed

SQL&gt; insert into src.t1
2  	select level, to_char(level)
3  		from dual
4  		connect by level &lt;= 100;

100 rows inserted   SQL&gt; commit;

Commit complete

SQL&gt; select count(*) from dst.t1;

COUNT(*)
----------
     100

SQL&gt; exec dbms_monitor.session_trace_disable(22);

PL/SQL procedure successfully completed
</pre>
<p>To my surprise, when I went to take a look at the trace file, I couldn&#8217;t find anything related to Apply process inserting rows into <code>dst.t1</code>, only a handful of internal housekeeping statements. This made me curious as to where all the stuff really went to so I&#8217;ve decided to take a look at <code>v$sql</code> and see if there will be any clues:</p>
<pre class="brush: sql;">SQL&gt; select sql_text
2  	from v$sql
3  	where lower(sql_text) like '%insert%dst%t1%';

SQL_TEXT
--------------------------------------------------------------------------------
select sql_text  from v$sql  where lower(sql_text) like '%insert%dst%t1%'</pre>
<p>The only thing I was able to see there was&nbsp;.&nbsp;.&nbsp;.&nbsp; my own statement. Things were starting to look serious.</p>
<p>In order to finally figure out who did what I&#8217;ve launched a logminer:</p>
<pre class="brush: sql;">
SQL&gt; begin
2  	dbms_logmnr.start_logmnr(
3  		startTime =&gt; sysdate-30/1440,
4  		endTime =&gt; sysdate,
5  		Options =&gt; dbms_logmnr.DICT_FROM_ONLINE_CATALOG+dbms_logmnr.CONTINUOUS_MINE
6  	);
7  end;
8  /

PL/SQL procedure successfully completed</pre>
<p>Let&#8217;s see who actually inserted all these rows:</p>
<pre class="brush: sql;">SQL&gt; select * from (
2  select session#, sql_redo
3  	from v$logmnr_contents
4  	where operation='INSERT'
5  		and seg_owner='DST'
6  		and table_name='T1'
7  	order by timestamp desc
8  ) where rownum &lt;= 5;

 SESSION# SQL_REDO
---------- --------------------------------------------------------------------------------
       22 insert into &quot;DST&quot;.&quot;T1&quot;(&quot;N&quot;,&quot;V&quot;) values ('1','1');
       22 insert into &quot;DST&quot;.&quot;T1&quot;(&quot;N&quot;,&quot;V&quot;) values ('5','5');
       22 insert into &quot;DST&quot;.&quot;T1&quot;(&quot;N&quot;,&quot;V&quot;) values ('4','4');
       22 insert into &quot;DST&quot;.&quot;T1&quot;(&quot;N&quot;,&quot;V&quot;) values ('3','3');
       22 insert into &quot;DST&quot;.&quot;T1&quot;(&quot;N&quot;,&quot;V&quot;) values ('2','2');
</pre>
<p>Session with SID 22 is nothing else but our Apply Server&nbsp;.&nbsp;.&nbsp;.&nbsp;</p>
<p>The next step was to try and figure out whether we&#8217;re really dealing with some new codepath responsible for such spectacular performance (apparently, due to the complete lack of instrumentation :) or this is just another weird Oracle bug.</p>
<p>I&#8217;ve blocked the Apply Server on updating a row and looked at the Apply Server&#8217;s stack:</p>
<pre class="brush: bash;">
[oracle@ora11gr2 trace]$ pstack 17036
#0  0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
#1  0x00000000085ef3f3 in sskgpwwait ()
#2  0x00000000085ee5c6 in skgpwwait ()
#3  0x000000000829ee31 in ksliwat ()
#4  0x000000000829e422 in kslwaitctx ()
#5  0x0000000000af92f5 in ksqcmi ()
#6  0x00000000082ac019 in ksqgtlctx ()
#7  0x00000000082aa77a in ksqgelctx ()
#8  0x0000000000c4d566 in ktcwit1 ()
#9  0x00000000082d5d99 in kdddgb ()
#10 0x00000000082c7530 in kdusru ()
#11 0x00000000082c0902 in kauupd ()
#12 0x0000000001f57c14 in kddiruUpdate ()
#13 0x000000000179eeab in knasdaExecDML ()
#14 0x000000000179d928 in knasdaProcDML ()
#15 0x000000000178c6fd in knaspd ()
#16 0x0000000001787d2f in knasplcr ()
#17 0x00000000017866d7 in knaspx ()
#18 0x0000000001770fd5 in knalsProc1Txn ()
#19 0x000000000177022d in knalsptxn ()
#20 0x00000000017424a6 in knasm2 ()
#21 0x0000000001776d8d in knalsma ()
#22 0x0000000000c25a7d in knlkcbkma ()
#23 0x0000000000b93ba7 in ksvrdp ()
#24 0x00000000020d2dd7 in opirip ()
#25 0x00000000016fe729 in opidrv ()
#26 0x0000000001b7183f in sou2o ()
#27 0x00000000009d3f8a in opimai_real ()
#28 0x0000000001b76ace in ssthrdmain ()
#29 0x00000000009d3e71 in main ()
</pre>
<p>Before we move on, here is a stack dump from a blocked Apply Server in 10.2.0.4:</p>
<pre class="brush: bash; highlight: [25];">
[oracle@ora10gr2 trace]$ pstack 23787
#0  0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
#1  0x00000000085ef3f3 in sskgpwwait ()
#2  0x00000000085ee5c6 in skgpwwait ()
#3  0x000000000829ee31 in ksliwat ()
#4  0x000000000829e422 in kslwaitctx ()
#5  0x0000000000af92f5 in ksqcmi ()
#6  0x00000000082ac019 in ksqgtlctx ()
#7  0x00000000082aa77a in ksqgelctx ()
#8  0x0000000000c4d566 in ktcwit1 ()
#9  0x00000000082d5d99 in kdddgb ()
#10 0x00000000082c7530 in kdusru ()
#11 0x00000000082c0902 in kauupd ()
#12 0x00000000084588c9 in updrow ()
#13 0x00000000084f2580 in qerupFetch ()
#14 0x0000000008453cdd in updaul ()
#15 0x0000000008451bca in updThreePhaseExe ()
#16 0x00000000084509f5 in updexe ()
#17 0x00000000083fe18f in opiexe ()
#18 0x00000000083f5c0d in opiall0 ()
#19 0x0000000008403d25 in opikpr ()
#20 0x00000000083f78b9 in opiodr ()
#21 0x00000000084892af in __PGOSF141_rpidrus ()
#22 0x00000000085ee820 in skgmstack ()
#23 0x000000000848a759 in rpiswu2 ()
#24 0x000000000848fdf4 in kprball ()
#25 0x0000000001c7c4d7 in knipxup ()
#26 0x0000000001c72651 in knipdis ()
#27 0x000000000178cacc in knaspd ()
#28 0x0000000001787d2f in knasplcr ()
#29 0x00000000017866d7 in knaspx ()
#30 0x0000000001770fd5 in knalsProc1Txn ()
#31 0x000000000177022d in knalsptxn ()
#32 0x00000000017424a6 in knasm2 ()
#33 0x0000000001776d8d in knalsma ()
#34 0x0000000000c25a7d in knlkcbkma ()
#35 0x0000000000b93ba7 in ksvrdp ()
#36 0x00000000020d2dd7 in opirip ()
#37 0x00000000016fe729 in opidrv ()
#38 0x0000000001b7183f in sou2o ()
#39 0x00000000009d3f8a in opimai_real ()
#40 0x0000000001b76ace in ssthrdmain ()
#41 0x00000000009d3e71 in main ()</pre>
<p>The stack is only 30 functions deep in 11.2.0.1 compared to 42 in 10.2.0.4! Given that whatever goes up the stack from <code>ktcwit1 ()</code> function is due to both sessions waiting on the enqueue, the relative codepath change is even bigger.</p>
<p>All the difference comes from the one key thing: a recursive call. If you take a look at line #25 (highlighted), you&#8217;ll notice <code>rpiswu2 ()</code> function (for these of you unfamiliar with Oracle Kernel Layers, RPI stands for Recursive Program Interface). Whatever happens further up the stack is essentially the same codepath any user session would use while executing an UPDATE statement. The Apply Servers in 10.2.0.4 generally behave like any other user session would and whatever diagnostic techniques you have learned while troubleshooting user issues could be, to a large extent, applied to the Apply Servers as well. Every LCR execution leads to at least one recursive call (so if you got, say, a transaction with 1000 LCRs that would be at least 1000 recursive calls by the Apply Server). In 11.2.0.1 the recursive call is missing and the codepath is different up to <code>kauupd ()</code> (KA, Access Layer) function.</p>
<p>Indeed, by looking at the Apply Server statistics in 11.2.0.1 you will notice that executing an LCR no longer results in a recursive call so the entire change seems to be around a shortcut which allows the Apply Server to proceed directly into KD (Data) layer, bypass the &#8220;regular&#8221; codepath and avoid a recursive call.</p>
<p>On a side note it appears the this new codepath was first introduced in 11.1.0.7.</p>
<h3>What&#8217;s up with the instrumentation?</h3>
<p>While performance improvement is certainly most welcome, there is a big downside&#8212;all these new functions seems to be poorly, or not-at-all, instrumented. This makes it hard to evaluate the gains, as some stuff is simply not there. </p>
<p>How are you supposed to figure out what&#8217;s going on then? The good news is that all regular dynamic performance views (like <code>v$session_wait</code>, <code>v$session_event</code>, etc.) seems to be populated correctly, but sql trace took a big hit (plus you can no longer see <code>sql_id</code> in <code>v$session</code>). Whatever falls out of the &#8220;old&#8221; stuff looks like a black box&nbsp;.&nbsp;.&nbsp;.&nbsp; pretty much.</p>
<p>Puzzled by this problem, I&#8217;ve tried to see whether there is any easy way to enable the old codepath so you can get all the instrumentation facilities back in place. After some trial and error, it turned out that a simple row level trigger&nbsp;.&nbsp;.&nbsp;.&nbsp;</p>
<pre class="brush: sql;">SQL&gt; create or replace trigger dst.buid_t1 before delete or insert or update on dst.t1
 2  for each row
 3  begin
 4  	null;
 5  end;
 6  /

Trigger created

SQL&gt; begin
 2  	dbms_ddl.set_trigger_firing_property('DST', 'BUID_T1', false);
 3  end;
 4  /

PL/SQL procedure successfully completed</pre>
<p>&nbsp;.&nbsp;.&nbsp;.&nbsp;is enough to get the old codepath back. Here is the stack of Apply Server process in 11.2.0.1 with such a trigger in place:</p>
<pre class="brush: bash; highlight: [26];">[oracle@ora11gr2 trace]$ pstack 30640
#0  0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
#1  0x00000000085ef3f3 in sskgpwwait ()
#2  0x00000000085ee5c6 in skgpwwait ()
#3  0x000000000829ee31 in ksliwat ()
#4  0x000000000829e422 in kslwaitctx ()
#5  0x0000000000af92f5 in ksqcmi ()
#6  0x00000000082ac019 in ksqgtlctx ()
#7  0x00000000082aa77a in ksqgelctx ()
#8  0x0000000000c4d566 in ktcwit1 ()
#9  0x00000000082d5d99 in kdddgb ()
#10 0x00000000082c7530 in kdusru ()
#11 0x00000000082c0902 in kauupd ()
#12 0x00000000084588c9 in updrow ()
#13 0x00000000084f2580 in qerupFetch ()
#14 0x00000000046a363f in qerstFetch ()
#15 0x0000000008453cdd in updaul ()
#16 0x0000000008451bca in updThreePhaseExe ()
#17 0x00000000084509f5 in updexe ()
#18 0x00000000083fe18f in opiexe ()
#19 0x00000000083f5c0d in opiall0 ()
#20 0x0000000008403d25 in opikpr ()
#21 0x00000000083f78b9 in opiodr ()
#22 0x00000000084892af in __PGOSF141_rpidrus ()
#23 0x00000000085ee820 in skgmstack ()
#24 0x000000000848a759 in rpiswu2 ()
#25 0x000000000848fdf4 in kprball ()
#26 0x0000000001c7c4d7 in knipxup ()
#27 0x0000000001c72651 in knipdis ()
#28 0x000000000178cacc in knaspd ()
#29 0x0000000001787d2f in knasplcr ()
#30 0x00000000017866d7 in knaspx ()
#31 0x0000000001770fd5 in knalsProc1Txn ()
#32 0x000000000177022d in knalsptxn ()
#33 0x00000000017424a6 in knasm2 ()
#34 0x0000000001776d8d in knalsma ()
#35 0x0000000000c25a7d in knlkcbkma ()
#36 0x0000000000b93ba7 in ksvrdp ()
#37 0x00000000020d2dd7 in opirip ()
#38 0x00000000016fe729 in opidrv ()
#39 0x0000000001b7183f in sou2o ()
#40 0x00000000009d3f8a in opimai_real ()
#41 0x0000000001b76ace in ssthrdmain ()
#42 0x00000000009d3e71 in main ()</pre>
<p>Now that looks much more familiar! All the instrumentation appeared to be back in place as well.</p>
<p>I&#8217;ve also discovered that:</p>
<ol>
<li><code>DELETE</code> seems to be always handled through the old codepath.</li>
<li>In case you have a unique constraint or a primary key supported by a non-unique index, <code>INSERT</code> will fall back to the old codepath.</li>
<li><code>UPDATE</code> needs a primary key or a key column(-s) supported by an index in order to use the new codepath.</li>
</ol>
<p>It remains to be seen whether this new codepath has been implemented as a shortcut for most frequently used scenarios, or whether there are some implementation restrictions as it progresses with the future releases&nbsp;.&nbsp;.&nbsp;.&nbsp; or maybe not, due to Golden Gate taking over as a strategic direction.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/6665/oracle-streams-apply-process-changes-in-11gr2/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Stabilize Oracle Bind Peeking Behaviour with Range-Based Predicates</title>
		<link>http://www.pythian.com/news/1022/stabilize-oracle-bind-peeking-behaviour-with-range-based-predicates/</link>
		<comments>http://www.pythian.com/news/1022/stabilize-oracle-bind-peeking-behaviour-with-range-based-predicates/#comments</comments>
		<pubDate>Wed, 28 May 2008 18:40:30 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[bind peeking]]></category>
		<category><![CDATA[range-based predicates]]></category>

		<guid isPermaLink="false">http://www.pythian.com/blogs/1022/stabilize-oracle-bind-peeking-behaviour-with-range-based-predicates</guid>
		<description><![CDATA[In my previous post, I described the most common cause for unstable plans due to bind peeking &#8212; histograms. It is now time to move forward and take a look at another case, namely range-based predicates. Strictly speaking, the cases I&#8217;m going to describe can appear without range-based predicates as well, you just need to [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, I described <a href="http://www.pythian.com/blogs/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms">the most common cause for unstable plans due to bind peeking &#8212; histograms</a>. It is now time to move forward and take a look at another case, namely <em>range-based predicates</em>. Strictly speaking, the cases I&#8217;m going to describe can appear without range-based predicates as well, you just need to remember that a range-based <em>operation</em> doesn&#8217;t necessarily imply a range-based <em>predicate</em>.</p>
<h3>How Can Range-Based Predicates Cause an Unstable Plan?</h3>
<p>Quite easy, take the following example:</p>
<pre>
SQL&gt; create table t as
  2  	select level n, rpad('x', 200, 'x') v
  3  		from dual
  4  		connect by level &lt;= 100000;

Table created

SQL&gt; create index i_t_n on t (n);

Index created

SQL&gt; exec dbms_stats.gather_table_stats(user, 't');

PL/SQL procedure successfully completed
</pre>
<p>Now, I&#8217;ll query the table using two different conditions:</p>
<pre>
SQL&gt; set autot traceonly explain
SQL&gt; select * from t where n &lt;= 100;

Execution Plan
-----------------------------------------------------
Plan hash value: 2912310446

----------------------------------------------------
| Id  | Operation                   | Name  | Rows |
----------------------------------------------------
|   0 | SELECT STATEMENT            |       |   95 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   95 |
|*  2 |   INDEX RANGE SCAN          | I_T_N |   95 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N"&lt;=100)

SQL&gt; select * from t where n &lt;= 25000;

Execution Plan
------------------------------------------
Plan hash value: 1601196873

------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      | 24998 |
|*  1 |  TABLE ACCESS FULL| T    | 24998 |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N"&lt;=25000)
</pre>
<p><em>(Note that I&#8217;ve trimmed the output for the sake of readability.)</em></p>
<p>The first query was executed using <code>INDEX RANGE SCAN</code>, the second one using <code>TABLE ACCESS FULL</code>. This makes perfect sense, since the first query is going to return only a small fraction of data in a table, while the second one is going to fetch substantially more data. As you probably already guessed, if you substitute a literal value for a bind variable,  your plan will depend on what value was passed during a hard parse: <span id="more-1022"></span></p>
<pre>
SQL&gt; variable n number;
SQL&gt; exec :n:=25000;

PL/SQL procedure successfully completed.

SQL&gt; set autot traceonly stat
SQL&gt; set arraysize 100
SQL&gt; select * from t where n&lt;=:n;

25000 rows selected.

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3198  consistent gets
       2650  physical reads
          0  redo size
     271952  bytes sent via SQL*Net to client
       3135  bytes received via SQL*Net from client
        251  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25000  rows processed

SQL&gt; exec :n:=100;

PL/SQL procedure successfully completed.

SQL&gt; select * from t where n&lt;=:n;

100 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2949  consistent gets
       2647  physical reads
          0  redo size
       1479  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
</pre>
<p>You can already spot from the stats that both queries have used <code>full table scans</code>:</p>
<pre>
SQL&gt; select sql_id, executions, child_number
  2   from v$sql
  3   where sql_text='select * from t where n&lt;=:n';

SQL_ID        EXECUTIONS CHILD_NUMBER
------------- ---------- ------------
avj8fuq6s3j2q          2            0

SQL&gt; select * from table(dbms_xplan.display_cursor('avj8fuq6s3j2q'));

PLAN_TABLE_OUTPUT
------------------------------------------
SQL_ID  avj8fuq6s3j2q, child number 0
-------------------------------------
select * from t where n&lt;=:n

Plan hash value: 1601196873

------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      |       |
|*  1 |  TABLE ACCESS FULL| T    | 24998 |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N"&lt;=:N)

18 rows selected.
</pre>
<p><em>(Note that I&#8217;ve trimmed the output again.)</em></p>
<p>If I had started my example with <code>n=100</code>, both executions would have used <code>index range</code> scans instead. What if both plans are equally important to you and you can&#8217;t sacrifice one in favor of the other? There are a number of solutions to that; I&#8217;m going to describe my favorite one.</p>
<h3>Tell the Database About It</h3>
<p>You have to cooperate with the database, because if you do not, then the database will not cooperate with you. It is a fair game and the choice is yours.</p>
<p>After all, your are the only one who has the intimate knowledge of your data and how it is used. Let&#8217;s say your application is using the following function which returns a ref-cursor:</p>
<pre>SQL&gt; create or replace function get_data(
  2  	p_n	in number
  3  ) return sys_refcursor is
  4  	l_rf sys_refcursor;
  5  begin
  6  	open l_rf for 'select * from t where n&lt;=:n' using p_n;
  7  	return l_rf;
  8  end;
  9  /

Function created</pre>
<p>The above function will be subject to the same problem as described above: the plan will be developed on the first hard parse, and all subsequent executions will use the same plan regardless of the fact that it might be inefficient. The optimal execution plan is driven by how much data you request, which in turn depends on what bind variable value was supplied.</p>
<p>There is a point where Oracle <em>should</em> switch to an FTS instead of an IRS (and vise versa). Ultimately, we would like to make our function smart enough to be able to provide us different execution plans, and at the same time, leave the final decision to the optimizer. If we expect potentially different execution plans when <code>p_n</code> changes to the power of 2, then all we have to do is tell the database about it:</p>
<pre>SQL&gt; create or replace function get_data(
  2  	p_n	in number
  3  ) return sys_refcursor is
  4  	l_rf sys_refcursor;
  5  begin
  6  	open l_rf for 'select /* '||to_char(power(2, floor(log(2, p_n))))||' */ * from t where n&lt;=:n' using p_n;
  7  	return l_rf;
  8  end;
  9  /

Function created</pre>
<p>The above function will automatically add a comment into each SQL statement, embedding as many rows (to the  power of 2) as we are expect:</p>
<pre>SQL&gt; variable rf refcursor;
SQL&gt; exec :rf:=get_data(100);

PL/SQL procedure successfully completed.

SQL&gt; print rf;

100 rows selected.

SQL&gt; exec :rf:=get_data(25000);

PL/SQL procedure successfully completed.

SQL&gt; print rf;

25000 rows selected.

SQL&gt; select	s.sql_text,
  2  		max(p.operation||' '||p.options) keep (dense_rank last order by id) access_path
  3    	from v$sql s, v$sql_plan p
  4    	where s.SQL_ID=p.SQL_ID
  5  		and s.sql_text like 'select /* % */ * from t where n&lt;=:n'
  6  	group by s.sql_id, s.sql_text;

SQL_TEXT                                 ACCESS_PATH
---------------------------------------- --------------------
select /* 16384 */ * from t where n&lt;=:n  TABLE ACCESS FULL
select /* 64 */ * from t where n&lt;=:n     INDEX RANGE SCAN
</pre>
<p>Since different comments will essentially lead to a different SQL statements, Oracle was able to develop appropriate plans.</p>
<p>This solution is very simple, and it provides the desired results. Its advantage is that, compared to replacing bind variables with literals, you still share most of your SQL while not sacrificing the quality of execution plans. Our function will result in only 17 different SQL statements for all possible values:</p>
<pre>SQL&gt; declare
  2   l_rf sys_refcursor;
  3  begin
  4   for i in 1 .. 100000
  5   loop
  6    l_rf:=get_data(i);
  7    close l_rf;
  8   end loop;
  9  end;
 10  /

PL/SQL procedure successfully completed.

SQL&gt; select s.sql_text,
  2      max(p.operation||' '||p.options)
  3     keep (dense_rank last order by id) access_path,
  4    max(executions) execs
  5     from v$sql s, v$sql_plan p
  6     where s.SQL_ID=p.SQL_ID
  7      and s.sql_text like 'select /* % */ * from t where n&lt;=:n'
  8     group by s.sql_text
  9   order by execs;

SQL_TEXT                                 ACCESS_PATH           EXECS
---------------------------------------- -------------------- ------
select /* 1 */ * from t where n&lt;=:n      INDEX RANGE SCAN          1
select /* 2 */ * from t where n&lt;=:n      INDEX RANGE SCAN          3
select /* 4 */ * from t where n&lt;=:n      INDEX RANGE SCAN          4
select /* 8 */ * from t where n&lt;=:n      INDEX RANGE SCAN          8
select /* 16 */ * from t where n&lt;=:n     INDEX RANGE SCAN         16
select /* 32 */ * from t where n&lt;=:n     INDEX RANGE SCAN         32
select /* 64 */ * from t where n&lt;=:n     INDEX RANGE SCAN         64
select /* 128 */ * from t where n&lt;=:n    INDEX RANGE SCAN        128
select /* 256 */ * from t where n&lt;=:n    INDEX RANGE SCAN        256
select /* 512 */ * from t where n&lt;=:n    INDEX RANGE SCAN        512
select /* 1024 */ * from t where n&lt;=:n   INDEX RANGE SCAN       1024
select /* 2048 */ * from t where n&lt;=:n   INDEX RANGE SCAN       2048
select /* 4096 */ * from t where n&lt;=:n   INDEX RANGE SCAN       4096
select /* 8192 */ * from t where n&lt;=:n   INDEX RANGE SCAN       8192
select /* 16384 */ * from t where n&lt;=:n  INDEX RANGE SCAN      16384
select /* 32768 */ * from t where n&lt;=:n  TABLE ACCESS FULL     32768
select /* 65536 */ * from t where n&lt;=:n  TABLE ACCESS FULL     34464

17 rows selected.
</pre>
<p>Because the decision is still made by the optimizer, the above function will be able to automatically adjust to changing data volumes as well. If tomorrow our table  contains 100M rows instead of 100K, our function will still be able to provide us good execution plans &#8212;  which, by the way, might be different from what we see with 100K rows. This is a key difference from explicitly hinting the query.</p>
<p>Note that we still can have different execution plans inside a &#8220;transient&#8221; window.  That should not be a problem, since this is where a plan switch should occur anyway, and doing the entire window one way or the other should not produce bad results. You can even go a bit further and make the base adjustable; this will allow us to properly balance SQL statements reusability while providing optimal execution plans:</p>
<pre>
SQL&gt; create or replace function get_data(
  2  	p_n	in number,
  3  	p_f	in number default 2
  4  ) return sys_refcursor is
  5  	l_rf sys_refcursor;
  6  begin
  7  	open l_rf for 'select /* '||to_char(power(p_f, floor(log(p_f, p_n))))||' */ * from t where n&lt;=:n' using p_n;
  8  	return l_rf;
  9  end;
 10  /

Function created
</pre>
<p>The above technique can be used virtually everywhere you have to have bind variables, and at the same time absolutely need different execution plans depending on bind variable values. You will only have to come up with a proper way to tell Oracle when you expect different plans,  based on your domain knowledge and how your data is being used.</p>
<h3>What About 11G?</h3>
<p>There is a new feature introduced in 11G to help overcome the above problem &#8212; the ability to produce a different child cursor if the database detects a potentially suboptimal execution plan due to bind peeking. If you are interested how well it performs, then stay tuned for a next blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/1022/stabilize-oracle-bind-peeking-behaviour-with-range-based-predicates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle 11G Result Cache in the Real World</title>
		<link>http://www.pythian.com/news/1004/oracle-11g-result-cache-in-the-real-world/</link>
		<comments>http://www.pythian.com/news/1004/oracle-11g-result-cache-in-the-real-world/#comments</comments>
		<pubDate>Tue, 13 May 2008 19:46:13 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[AskTom]]></category>
		<category><![CDATA[oltp]]></category>
		<category><![CDATA[Oracle 11g]]></category>
		<category><![CDATA[Oracle Magazine]]></category>

		<guid isPermaLink="false">http://www.pythian.com/blogs/1004/oracle-11g-result-cache-in-the-real-world</guid>
		<description><![CDATA[As some of you probably already noticed, there was  a thread on AskTom discussing the scalability tests I did back in 2007. You are welcome to read the entire thread, but in a nutshell, Tom Kyte claimed that my tests did not reflect how one would use the result cache in the  real [...]]]></description>
			<content:encoded><![CDATA[<p>As some of you probably already noticed, there was <a href="http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:676698900346506951"> a thread on AskTom</a> discussing the <a href="http://www.pythian.com/blogs/683/oracle-11g-result-cache-tested-on-eight-way-itanium">scalability tests</a> I did back in 2007. You are welcome to read the entire thread, but in a nutshell, Tom Kyte claimed that my tests did not reflect how one would use the result cache in the  real world.</p>
<h3>What is &#8220;real world?&#8221;</h3>
<p>Of course, the important question is whether I tested a feature in a way it was never designed to be used,  or whether someone is just trying to make an excuse for poor scalability results by <em>defining</em> &#8220;real world&#8221; in a way that makes my tests inappropriate.</p>
<h3>A new feature</h3>
<p>What do you do, then, you first see a new feature? You read about it in the documentation, and then you test it in order to compare what you have read with what you have in reality.</p>
<h3>What the documentation tells us</h3>
<p>Open <a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/memory.htm#i53187">the Performance Tuning Guide</a> and go to <em>7.3.1.4 Result Cache Concepts</em>:</p>
<blockquote><p>When these queries and functions are executed repeatedly, the results are retrieved directly from the cache memory. This results in a faster response time. The cached results stored become invalid when data in the dependent database objects is modified. The use of the result cache is a database-wide decision.</p></blockquote>
<p>All it says is that you have to have repeatedly-executed functions and queries to get faster response time. It says nothing about what <em>kind</em> of queries or functions. It also suggests that the result cache should be used database-wide or shouldn&#8217;t be used at all (which is perfectly sound according to Jonathan Lewis&#8217;s <a href="http://jonathanlewis.wordpress.com/2008/05/02/rules-for-hinting">Rules for Hinting</a>).</p>
<p>Now skip up to <em>7.3.2.7 Use of Result Cache</em>:</p>
<blockquote><p>OLTP applications can benefit significantly from the use of the result cache. The benefits highly depend on the application. Consider the use of the PL/SQL function result cache and the SQL query result cache when evaluating whether your application can benefit from the result cache.</p></blockquote>
<p>It clearly says that result cache is perfectly appropriate for OLTP applications. They  leave a backdoor with the words, &#8220;<em>depend on the application</em>&#8221; but, yet again, they say nothing about what <em>kind</em> of OLTP applications.</p>
<p><span id="more-1004"></span></p>
<h3>Am I reading something wrong there?</h3>
<p>Now, I may well be misunderstanding something, but I got the following impressions about the result cache after reading the documentation:</p>
<ul>
<li>Its use is recommended for frequently-executed functions and queries.</li>
<li>It is to be used database-wide or not used at all.</li>
<li>it is highly recommended for OLTP applications.</li>
</ul>
<p>The above doesn&#8217;t leave the impression that the result cache is not appropriate for caching small queries and functions being executed at a very high rate with high degree of concurrency (yes, I am spelling out the basic definition of an OLTP system right now). Moreover, the documentation actually suggests that it is <em>recommended</em> to use result cache in such circumstances.</p>
<h3>What others are doing?</h3>
<p>Let&#8217;s say there is still a chance that I misread documentation. Others must be doing something else, then. So I started to search around to see other examples.</p>
<p>I saw that Steven Feuerstein wrote <a href="http://www.oracle.com/technology/oramag/oracle/07-sep/o57plsql.html">On the PL/SQL Function Result Cache</a>. You can spot the  words &#8220;<em>best practices</em>&#8221; and &#8220;<em>the real world</em>&#8221; there. He even explains which kind of application he is simulating. This article was published in Oracle Magazine which means someone actually reviewed it before publishing and it shows good result cache usage.</p>
<p>What is the test about? <em>It caches a function that does a single-row lookups into a table.</em> I was doing exactly the same in my scalability tests with the exception that I was caching the query itself rather than a PL/SQL function that calls it. The results, though, are quite different. Of course, you can shave off more by caching a function call, and this article is able to demonstrate quite a bit of improvement, which is probably why no one has complained about this test being less than &#8220;real world&#8221; &#8212; despite the fact that it is doing essentially the same thing as mine did.</p>
<h3>What about scalability?</h3>
<p>The tests published in Oracle Magazine article lacks an important part:  they didn&#8217;t test concurrency. If you read my blog posts regarding result cache, you already know that it is not the pure speed of getting something out which is a problem &#8212; it is <em>concurrency</em> that kills the result cache.</p>
<p>If I am right about the result cache not being scalable, I should see the same results. So I took the same test case and tested it in a concurrent environment (we&#8217;re after &#8220;real world&#8221; here). Here is how I created an employees table:</p>
<pre>
SQL&gt; create table employees as
  2  	select level employee_id, rpad('x', 100, 'x') v
  3  		from dual
  4  		connect by level &lt;= 10000;

Table created

SQL&gt; alter table employees add constraint pk_employees primary key (employee_id);

Table altered
</pre>
<p>Note that I am already giving the result cache an artificial advantage by not placing this table into a hash cluster (and a hash cluster is perfectly acceptable, considering test case description).</p>
<p>After the employees table, I created two functions in exactly as in the Oracle Magazine article &#8212; one that does result caching, and another that does not. I ran my tests in exactly the same way as I did before &#8212; parallel job processes executed the function using a random <code>employee_id</code>. I measured how many executions per second we were able to archive across 1 million executions. I repeated every test twice, and picked up the best result. I also made sure both runs provided consistent results. As before, all tests were performed on an 8-way Itanium server.  Here is what I got:</p>
<table border="1">
<tr>
<th># of prcoesses</th>
<th>Not Cached</th>
<th>% linear</th>
<th>Cached</th>
<th>% linear</th>
</tr>
<tr>
<td>1</td>
<td>12133</td>
<td>100%</td>
<td>76453</td>
<td>100%</td>
</tr>
<tr>
<td>2</td>
<td>21739</td>
<td>90%</td>
<td>112296</td>
<td>73%</td>
</tr>
<tr>
<td>3</td>
<td>31945</td>
<td>88%</td>
<td>102810</td>
<td>45%</td>
</tr>
<tr>
<td>4</td>
<td>42243</td>
<td>87%</td>
<td>104904</td>
<td>34%</td>
</tr>
<tr>
<td>5</td>
<td>52323</td>
<td>86%</td>
<td>106678</td>
<td>28%</td>
</tr>
<tr>
<td>6</td>
<td>62716</td>
<td>86%</td>
<td>101695</td>
<td>22%</td>
</tr>
<tr>
<td>7</td>
<td>70190</td>
<td>83%</td>
<td>94569</td>
<td>18%</td>
</tr>
<tr>
<td>8</td>
<td>70485</td>
<td>73%</td>
<td>43386</td>
<td>7%</td>
</tr>
</table>
<p>And a nice graph:</p>
<p><a href='http://www.pythian.com/blogs/wp-content/uploads/pl_scal.JPG' title='pl_scal.JPG'><img src='http://www.pythian.com/blogs/wp-content/uploads/pl_scal.JPG' alt='pl_scal.JPG' /></a></p>
<p>I hope you guessed which line is red.</p>
<p>And if we draw a graph out of &#8220;% linear&#8221;:</p>
<p><a href='http://www.pythian.com/blogs/wp-content/uploads/pl_linear.JPG' title='pl_linear.JPG'><img src='http://www.pythian.com/blogs/wp-content/uploads/pl_linear.JPG' alt='pl_linear.JPG' /></a></p>
<p>I wonder if anyone still says that the result cache scales.  Because it simply doesn&#8217;t.</p>
<p>Of course, you may invalidate my tests by saying that, because everyone lives in a single-user environment,  concurrent tests are not &#8220;real world&#8221;.</p>
<h3>So what now?</h3>
<p>I understand Tom&#8217;s point about my using result cache in a way he doesn&#8217;t envision it. Let&#8217;s not forget, however,  that I wrote my blog post back in November 2007, and back then, all we had was a standard Oracle Documentation. I see my tests as perfectly in accord with what the documentation says (and you can actually see others using result cache as I did). And instead of discarding all these tests by simply &#8220;adjusting&#8221; what &#8220;real world&#8221; means, I thought I could better spent my time by clarifying these issues in the documentation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/1004/oracle-11g-result-cache-in-the-real-world/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Stabilize Oracle 10G&#8217;s Bind Peeking Behaviour by Cutting Histograms</title>
		<link>http://www.pythian.com/news/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms/</link>
		<comments>http://www.pythian.com/news/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms/#comments</comments>
		<pubDate>Tue, 18 Mar 2008 18:48:11 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Group Blog Posts]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[bind peeking]]></category>
		<category><![CDATA[histograms]]></category>
		<category><![CDATA[Oracle 10g]]></category>

		<guid isPermaLink="false">http://www.pythian.com/blogs/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms</guid>
		<description><![CDATA[I wrote this post because I feel there is a great need for it. The number of people struggling with unstable query plans due to bind peeking in Oracle 10G is enormous, to say the least. More than that, solutions like disabling bind variable peeking are driving us away from understanding the root cause of [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote this post because I feel there is a great need for it. The number of people struggling with unstable query plans due to bind peeking in Oracle 10G is enormous, to say the least. More than that, solutions like disabling bind variable peeking are driving us away from understanding the root cause of the problem and applying the right fix to it.</p>
<h3>What are the causes of unstable plans due to bind variable peeking?</h3>
<p>There are three things that might put you at risk of unstable plans due to bind variable peeking. Those are histograms, partitions, and range-based predicates. I&#8217;ll cover last two in upcoming blog posts.</p>
<h3>Histograms</h3>
<p>Let me share with you my d&eacute;ja  vu. When I see this:</p>
<pre>
SQL&gt; select value
  2  	from v$parameter
  3  	where name='_optim_peek_user_binds';

VALUE
--------------------
FALSE
</pre>
<p>I immediately expect this:</p>
<pre>
SQL&gt; select	sum(case when max_cnt &gt; 2 then 1 else 0 end) histograms,
  2  		sum(case when max_cnt &lt;= 2 then 1 else 0 end) no_histograms
  3  from (
  4  	select table_name, max(cnt) max_cnt
  5  		from (
  6  			select table_name, column_name, count(*) cnt
  7  				from dba_tab_histograms
  8  				group by table_name, column_name
  9  		) group by table_name
 10  );

HISTOGRAMS NO_HISTOGRAMS
---------- -------------
      1169          2494
</pre>
<p>The above is an example from a real-world OLTP system running with bind peeking disabled. It is no surprise to me.  An exception, you say?   Here&#8217;s another one&nbsp;.&nbsp;.&nbsp;. <span id="more-867"></span></p>
<pre>
SQL&gt; select value
  2  	from v$parameter
  3  	where name='_optim_peek_user_binds';

VALUE
--------------------
FALSE

SQL&gt;
SQL&gt; select	sum(case when max_cnt &gt; 2 then 1 else 0 end) histograms,
  2  		sum(case when max_cnt &lt;= 2 then 1 else 0 end) no_histograms
  3  from (
  4  	select table_name, max(cnt) max_cnt
  5  		from (
  6  			select table_name, column_name, count(*) cnt
  7  				from dba_tab_histograms
  8  				group by table_name, column_name
  9  		) group by table_name
 10  );

HISTOGRAMS NO_HISTOGRAMS
---------- -------------
       304           521
</pre>
<p>Here comes my d&eacute;ja  vu. If you tell me that you disabled bind peeking, my immediate response will be &#8220;do you have a lot of unnecessary histograms?&#8221; On an OLTP system, there is no way that you need histograms on a third of your tables (and I can hardly think of any DSS system where this amount of histograms can be justified).</p>
<h3>How histograms and bind variable peeking can cause an unstable plan</h3>
<p>I&#8217;ll give you a simple example:</p>
<pre>
SQL&gt; create table t as
  2  	select case when level &lt;= 9900 then 0 else level-9900 end n,
  3  			rpad('*', 100, '*') v
  4  		from dual
  5  		connect by level &lt;= 10000;

Table created

SQL&gt; create index i_t_n on t (n);

Index created

SQL&gt; exec dbms_stats.gather_table_stats(user, 't', method_opt =&gt; 'for columns n size 254', cascade =&gt; true);

PL/SQL procedure successfully completed
</pre>
<p>In other words, we have a table <code>T</code> with 10000 rows, where 9900 rows have <code>N=0</code> and 100 rows have <code>N from 1 to 100</code>. The histogram tells Oracle about this data distribution:</p>
<pre>SQL&gt; select column_name, endpoint_number, endpoint_value
  2  	from user_tab_histograms
  3  	where table_name='T';

COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- --------------- --------------
N                     9900              0
N                     9901              1
N                     9902              2
N                     9903              3
N                     9904              4
N                     9905              5
...skipped for clarity...
N                     9997             97
N                     9998             98
N                     9999             99
N                    10000            100

101 rows selected</pre>
<p>The above is known as a skewed data distribution, and a histogram can help Oracle to choose the right plan, depending on the value:</p>
<pre>SQL&gt; set autot traceonly explain
SQL&gt; select * from t where n=0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9900 |   531K|    42   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |  9900 |   531K|    42   (0)| 00:00:01 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N"=0)

SQL&gt; select * from t where n=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2912310446

--------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    55 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    55 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_T_N |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N"=1)</pre>
<p>That is, Oracle is able to choose an index range scan when <code>N=1</code> (returns only one row) and do an FTS when <code>N=0</code> (returns 9900 rows) &#8212; perfect execution plans given the conditions. What does this have to do with bind peeking you ask? Well, imagine that you&#8217;ve submitted the following query:</p>
<pre>
select * from t where n=:n;
</pre>
<p>We used a bind variable in place of a literal. On a hard parse, Oracle will peek at the value you&#8217;ve used for <code>:n</code>, and will optimize the query as if you&#8217;ve submitted the same query with this literal instead. The problem is that, in 10G, bind variable peeking happens only on a hard parse, which means that all following executions will use the same plan, regardless of the bind variable value. This is easy enough to demonstrate:</p>
<pre>SQL&gt; variable n number;
SQL&gt; exec :n:=0;

PL/SQL procedure successfully completed.

SQL&gt; set autot traceonly stat

SQL&gt; select * from t where n=:n;

9900 rows selected.

Statistics
----------------------------------------------------------
        982  recursive calls
          0  db block gets
        951  consistent gets
          0  physical reads
          0  redo size
     106664  bytes sent via SQL*Net to client
       7599  bytes received via SQL*Net from client
        661  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
       9900  rows processed

SQL&gt; exec :n:=1;

PL/SQL procedure successfully completed.

SQL&gt; select * from t where n=:n;

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        155  consistent gets
          0  physical reads
          0  redo size
        476  bytes sent via SQL*Net to client
        350  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
</pre>
<p>You can see that both executions used the same cursor with a full table scan:</p>
<pre>SQL&gt; select sql_id, executions, child_number
  2   from v$sql
  3   where sql_text = 'select * from t where n=:n';

SQL_ID        EXECUTIONS CHILD_NUMBER
------------- ---------- ------------
g2n32un6t1c55          2            0

SQL&gt; select * from table(dbms_xplan.display_cursor('g2n32un6t1c55'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
SQL_ID  g2n32un6t1c55, child number 0
-------------------------------------
select * from t where n=:n

Plan hash value: 1601196873

--------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    42 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |  9900 |   531K|    42   (0)| 00:00:01 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N"=:N)

18 rows selected.
</pre>
<p>Both queries would use an index range scan if I had started my example with <code>exec :n:=1</code> instead of <code>exec :n:=0</code>.</p>
<p>Now, if 99% of your queries are not interested in querying the above table with <code>:N:=0</code>, then you want the plan with an index range scan because it will provide optimal performance most of the time, while resulting in suboptimal performance for only 1% of executions. With the histogram in place, one day you will be unlucky enough to have Oracle hard parse the query with a bind variable value of 0, which will force everyone else to use an FTS (as was demonstrated above), which in turn will result an abysmal performances for 99% of executions (until the next hard parse when you might get lucky again). And if you have a system where a third of the  tables have histograms on them then &#8212; I think you probably get the idea now.</p>
<h3>What to do?</h3>
<p>Well, just get rid of any histogram that does nothing but messes up your execution plans. That&#8217;s easy enough:</p>
<pre>SQL&gt; exec dbms_stats.gather_table_stats(user, 't', method_opt =&gt; 'for columns n size 1', cascade =&gt; true);

PL/SQL procedure successfully completed</pre>
<p>Oracle will no longer have distribution information, and you&#8217;ll get the same plan regardless of the value:</p>
<pre>SQL&gt; select * from t where n=0;

Execution Plan
----------------------------------------------------------
Plan hash value: 2912310446

--------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    99 |  5445 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |    99 |  5445 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_T_N |    99 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N"=0)

SQL&gt; select * from t where n=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2912310446

--------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    99 |  5445 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |    99 |  5445 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_T_N |    99 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N"=1)</pre>
<p>No more surprises! This also means that you&#8217;ll get the same plan regardless of the bind variable value during a hard parse. However, the real question is&nbsp;.&nbsp;.&nbsp;.</p>
<h3>What is making all those histograms I didn&#8217;t ask for?</h3>
<p>It is a default behavior.  With the declaration of RBO&#8217;s obsolescence in 10G, we were also presented with a default gather stats job in every 10G database. This jobs runs with a whole bunch of AUTO parameters, but one parameter is of particular interest to us:</p>
<pre>SQL&gt; select dbms_stats.get_param('method_opt') method_opt from dual;

METHOD_OPT
--------------------------------------------------------------------
FOR ALL COLUMNS SIZE AUTO
</pre>
<p>The <code>SIZE ...</code> part controls histograms collection. You can get the definition of <code>AUTO</code> in the Oracle documentation:</p>
<blockquote><p>AUTO: Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.</p></blockquote>
<p>Well, as it says &#8212; Oracle will decide on which columns to collect histograms. Not you. As a result, every day when this jobs runs, you might be presented with something new to make your life more interesting.</p>
<h3>Why <code>SIZE AUTO</code> is doing such a bad job?</h3>
<p>Because coming up with optimal parameters for statistic gathering involves many more variables than the DBMS_STATS package can ever have. As a result, <code>AUTO</code> tends to be a &#8220;lowest common denominator&#8221;.</p>
<p>For example, if you have an OLTP system, chances are you don&#8217;t need histograms at all (apart from on a couple of tables, maybe). But how can <code>DBMS_STATS</code> know which type of a system do you run? OLTP, DWH, or mixed? Or maybe you&#8217;re doing OLTP five days per week and DWH on a weekend? <code>DBMS_STATS</code> tries to use some heuristics to come up with an answer, but that&#8217;s why two days of DWH on a weekend can completely screw up your OLTP activity for the rest of the week. <code>DBMS_STATS</code> just doesn&#8217;t have enough information. We humans do.</p>
<h3>What could we use instead of AUTO?</h3>
<p>Because to the above, there is no single answer since it involves &#8220;know your data&#8221; and &#8220;apply your domain knowledge&#8221;. However, there is one option that works particularly well for most environments. I&#8217;m talking about <code>REPEAT</code>.</p>
<blockquote><p><code>REPEAT</code>: Collects histograms only on the columns that already have histograms.</p></blockquote>
<p>That is, Oracle will no longer make histograms you didn&#8217;t ask for. This will be your first step in stabilizing your bind peeking behaviour:</p>
<pre>SQL&gt; exec dbms_stats.set_param('method_opt', 'FOR ALL COLUMNS SIZE REPEAT');

PL/SQL procedure successfully completed</pre>
<h3>What to do with existing histograms?</h3>
<p>Dealing with them depends on the situation. Chances are, however, you have many more histograms than you&#8217;ll ever need. That means that starting from scratch  to figure out when <em>do</em> you need histograms is usually a much simpler task compared to the clean-up of existing onces. If this is your case, then it might be a good idea to wipe out all histograms in a database (gather your stats with <code>FOR ALL COLUMNS SIZE 1</code> clause), and manually add them when you decide that you really need one. </p>
<p>The number of times I have had to go back to add histogram after a complete wipe out is surprisingly low &#8212; much lower than number of surprises histograms were causing on these systems,  and those systems never had any bind peeking surprises due to the excessive amount of unnecessary histograms.</p>
<h3>Are there any other sources of unnecessary histograms in my system?</h3>
<p>Absolutely. People are the next source of unnecessarily histograms. It constantly surprises me how many people treat histograms as a kind of a silver bullet. For example, someone recently  communicated to me that <code>select count(*) from table</code> was running slow, and added that, &#8220;maybe we should collect histograms on that table.&#8221; How on earth will a histogram  help you to run this query faster? Many histograms on your system might be a result of a complete misunderstanding how histograms work, what they do, and, more important, what they do not.</p>
<h3>I have a misbehaving query due to incorrect peeking caused by histogram, and I need to fix it right now. What do I do ?</h3>
<p>Don&#8217;t hurry to flush your shared pool. First, as a result of a complete brain cleaning, your instance will have to hard parse everything it had in the library cache, causing tons of redundant work.  Second, and this is much more important, these hard parses might well result in an incorrect peeking for some other queries. So you might end up in a worse situation than you were in before.</p>
<p>The right way is to get rid of a histogram by collecting stats on required table, with <code>METHOD_OPT =&gt; 'FOR COLUMNS X SIZE 1'</code> and <code>NO_INVALIDATE =&gt; FALSE</code>. This will cause all dependent cursors to be invalidated immediately after stats have been gathered.</p>
<p>Sometimes, however,  you don&#8217;t have enough time to understand what caused a problem (or you simply don&#8217;t have time to regather the stats) and, if probability theory is on your side (chances for a good peeking are much higher),  all you have to do to invalidate dependent cursors is to create comment on a table:</p>
<pre>SQL&gt; select n from t where n=:n;

         N
----------
         1

SQL&gt; select executions, invalidations
  2   from v$sql
  3   where sql_text = 'select n from t where n=:n';

EXECUTIONS INVALIDATIONS
---------- -------------
         2             0

SQL&gt; comment on table t is '';

Comment created.

SQL&gt; select n from t where n=:n;

         N
----------
         1

SQL&gt; select executions, invalidations
  2   from v$sql
  3   where sql_text = 'select n from t where n=:n';

EXECUTIONS INVALIDATIONS
---------- -------------
         1             1</pre>
<p>The above will invalidate cursors that depend only on a specific table, thus significantly decreasing the risk of side effects.</p>
<h3>Isn&#8217;t that much more work compared to simply turning off bind variable peeking?</h3>
<p>Actually it is not. If you are starting a new system, all you have to do is to modify the default parameter from <code>AUTO</code> to <code>REPEAT</code> and you are done. You&#8217;ll have to create all required histograms manually but that&#8217;s our intentional goal: to do histogram creation in a controllable and predictable manner.</p>
<p>For existing systems that are plagued by gazillions of histograms, you&#8217;ll have to figure out what to do. Wiping out histograms for entire database will do for queries with bind variables pretty much the same as turning off bind peeking. I still think, however,  that disabling bind peeking in this situation is the wrong choice, since (a) you still run the risk of getting unpredictable results from queries with literals; and (b) you will be doing tons more work during statistics collection, since histogram computation is expensive.</p>
<h3>Is there anything else wrong with disabling bind peeking?</h3>
<p>Running Oracle Database with an underscore parameter makes you different from the rest of the world, and this is not how Oracle Database was tested and intended to be run in a first place (think bugs). While disabling bind peeking seems to be relatively safe, it also very easy to avoid doing so.</p>
<h3>Get histograms under control</h3>
<p>Making sure new histograms appear in a controllable and predictable manner will be your first step in building a predictable environment. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms/feed/</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
		<item>
		<title>Oracle 11g Result Cache Tested on Eight-Way Itanium</title>
		<link>http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium/</link>
		<comments>http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium/#comments</comments>
		<pubDate>Tue, 27 Nov 2007 17:55:10 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Group Blog Posts]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[buffer cache]]></category>
		<category><![CDATA[dual core]]></category>
		<category><![CDATA[eight-way]]></category>
		<category><![CDATA[Oracle 11g]]></category>
		<category><![CDATA[result cache]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://www.pythian.com/blogs/683/oracle-11g-result-cache-tested-on-eight-way-itanium</guid>
		<description><![CDATA[This will be the final post in my series on Result Caches. In my previous article, I had already got almost everything. Almost &#8212; four CPUs (cores) were still not enough to saturate the single latch. As you&#8217;ve probably already guessed, today we are going with an eight-way test.
Please note that today&#8217;s numbers are different [...]]]></description>
			<content:encoded><![CDATA[<p>This will be the final post in my series on Result Caches. In my <a href="http://www.pythian.com/blogs/660/does-oracle-11gs-result-cache-scale-poorly">previous article</a>, I had already got <em>almost</em> everything. Almost &#8212; four CPUs (cores) were still not enough to saturate the single latch. As you&#8217;ve probably already guessed, today we are going with an eight-way test.</p>
<p>Please note that today&#8217;s numbers are different since I&#8217;m using an entirely different hardware platform. While the four-way tests were done on a 2.4GHz Core 2 Quad box, today&#8217;s eight-way tests were done using four dual core Itanium 2 CPUs running at 1.1GHz.</p>
<p>Let&#8217;s take a look at the results:</p>
<table border="1" width="431px">
<th># of processes</th>
<th>Buffer Cache</th>
<th>% linear</th>
<th>Result Cache</th>
<th>% linear</th>
<tr>
<td>1</td>
<td>15085</td>
<td>100%</td>
<td>15451</td>
<td>100%</td>
</tr>
<tr>
<td>2</td>
<td>26745</td>
<td>88.65%</td>
<td>28881</td>
<td>93.46%</td>
</tr>
<tr>
<td>3</td>
<td>39144</td>
<td>86.5%</td>
<td>40628</td>
<td>87.65%</td>
</tr>
<tr>
<td>4</td>
<td>52342</td>
<td>86.75%</td>
<td>52625</td>
<td>85.15%</td>
</tr>
<tr>
<td>5</td>
<td>63922</td>
<td>84.75%</td>
<td>62767</td>
<td>81.25%</td>
</tr>
<tr>
<td>6</td>
<td>76336</td>
<td>84.34%</td>
<td>69549</td>
<td>75.02%</td>
</tr>
<tr>
<td>7</td>
<td>88844</td>
<td>84.14%</td>
<td>74208</td>
<td>68.61%</td>
</tr>
<tr>
<td>8</td>
<td>100959</td>
<td>83.66%</td>
<td>76768</td>
<td>62.11%</td>
</tr>
</table>
<p>I made a nice-looking graph from this:</p>
<p><img src='http://www.pythian.com/blogs/wp-content/uploads/bc_rc_scal.GIF' alt='BC vs. RC' /></p>
<p><span id="more-683"></span></p>
<p>The performance drops are quite dramatic. While going from one to two processes can bring us an additional 13430 RC lookups per second, going from seven to eight processes gives us only 2560.</p>
<p>And stats regarding <code>Result Cache: Latch</code>:</p>
<table border="1" width="431px">
<th># of processes</th>
<th>Gets</th>
<th>Misses</th>
<th>Sleeps</th>
<th>Wait Time</th>
<tr>
<td>1</td>
<td>2000001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>4000002</td>
<td>10004</td>
<td>12</td>
<td>5137</td>
</tr>
<tr>
<td>3</td>
<td>6000003</td>
<td>198365</td>
<td>7338</td>
<td>93164</td>
</tr>
<tr>
<td>4</td>
<td>8000004</td>
<td>473303</td>
<td>37683</td>
<td>330768</td>
</tr>
<tr>
<td>5</td>
<td>10000005</td>
<td>997602</td>
<td>131166</td>
<td>1165493</td>
</tr>
<tr>
<td>6</td>
<td>12000006</td>
<td>1838640</td>
<td>345487</td>
<td>3652257</td>
</tr>
<tr>
<td>7</td>
<td>14000007</td>
<td>3059147</td>
<td>756540</td>
<td>9915421</td>
</tr>
<tr>
<td>8</td>
<td>16000008</td>
<td>4579892</td>
<td>1436130</td>
<td>24922732</td>
</tr>
</table>
<p>Here&#8217;s another pretty graph, this time of <code>Result Cache: Latch</code> wait time:</p>
<p><img src='http://www.pythian.com/blogs/wp-content/uploads/rc_latch_wait.GIF' alt='RC Latch Wait' /></p>
<p>As you can see from the above figures, it only takes six concurrently-running processes before we start observing major issues regarding the <code>Result Set: Latch</code> contention. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/683/oracle-11g-result-cache-tested-on-eight-way-itanium/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Oracle is Not Compatible with Oracle</title>
		<link>http://www.pythian.com/news/679/oracle-is-not-compatible-with-oracle/</link>
		<comments>http://www.pythian.com/news/679/oracle-is-not-compatible-with-oracle/#comments</comments>
		<pubDate>Thu, 22 Nov 2007 20:42:25 +0000</pubDate>
		<dc:creator>Alex Fatkulin</dc:creator>
				<category><![CDATA[Group Blog Posts]]></category>
		<category><![CDATA[Not on Homepage]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[funny]]></category>

		<guid isPermaLink="false">http://www.pythian.com/blogs/679/oracle-is-not-compatible-with-oracle</guid>
		<description><![CDATA[Just a short blog entry about a funny error message I&#8217;ve got while trying to activate a physical standby database:
SQL&#62; alter database recover managed standby database finish skip standby logfile;
alter database recover managed standby database finish skip standby logfile
*
ERROR at line 1:
ORA-00283: recovery session canceled due to errors
ORA-01110: data file 1: '/oradata/stage/datafile/system_01.dbf'
ORA-01122: database file 1 [...]]]></description>
			<content:encoded><![CDATA[<p>Just a short blog entry about a funny error message I&#8217;ve got while trying to activate a physical standby database:</p>
<pre>SQL&gt; alter database recover managed standby database finish skip standby logfile;
alter database recover managed standby database finish skip standby logfile
*
ERROR at line 1:
ORA-00283: recovery session canceled due to errors
ORA-01110: data file 1: '/oradata/stage/datafile/system_01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/oradata/stage/datafile/system_01.dbf'
ORA-01130: database file version 9.2.0.0.0 incompatible with ORACLE
version 9.2.0.0.0</pre>
<p>Database file version <code>9.2.0.0.0</code> is incompatible with ORACLE version <code>9.2.0.0.0</code>, is it?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pythian.com/news/679/oracle-is-not-compatible-with-oracle/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
