Quantifying Commit Time

Apr 13, 2006 / By Christo Kutrovsky

Tags: ,

Have you ever wondered how fast you can commit ?

I’ve been introduced to an application which was doing commits in every step of a higher level transaction. The result was a significant portion of the time was spent in “log file sync” and there was nothing you could do. A commit can only be so fast. Your overall high level transaction speed will be limited by how fast you can commit.

No matter what you do, a true commit will take time. The time will be very small, but it will still be more then doing an in memory query of a lookup table. And worst of all, you will serialize at the entire instance on it. If a commit has caused a “log file sync”, another session will have to wait for the already started flush, and then wait for it’s data to be flushed in effect waiting twice the time.

So I did a test on how fast you can commit with 1 session. This will be the absolute maximum you can achieve in commit performance.

Starting in Oracle 10g testing commit time is a bit easier. With the introduction of the new commit parameters, “batch” and “nowait” and more importantly, the “wait” counterpart.

The default commit does this:

1. writes ‘end of transaction’ record into the redo buffer

2. posts (sends message) LGWR, telling it to flush the redo buffer to disk

3. waits for the flush to finish into the well known ‘log file sync’ event.

The “nowait” and “batch” parameters control how far in those steps you want to go. Basically:

- commit write nowait – will avoid step 3 (it will still tell LGWR to flush the buffer)

- commit write batch,nowait – will avoid step 2 and 3, allowing the data to remain buffered into the LOG_BUFFER until either someone else commits, causing a flush, or some of the background events cause the buffer to be flushed asynchronously. The background events are:

- the buffer is 1/3 full

- the buffer is 1 mb full

- 3 seconds have elapsed

A lesser known fact is that PL/SQL will silently optimize your commits to “commit write nowait”. Only the last commit in a given call will be a real “commit”.

The reason for this optimisation is the nature of client/server relationships. In theory, if you call a PL/SQL procedure, the client gets no information about its progress until the procedure has returned. So why wait for disk flushes?

You can easily verify this by running the following example (10g only):

We’ll create this table in and use it in all of our tests:

create table c_test (a number);

Then run this code :

set timing on 
declare i number; 
begin 
for i in 1..10000 loop 
insert into c_test values (55); 
commit; 
end loop; 
end; 
/

And notice how much time and how many “log file sync” waits (the count, not time) you had.

Then run this code:

declare i number; 
begin 
for i in 1..10000 loop 
insert into c_test values (55); 
commit write wait; 
end loop; 
end; 
/

Now you will get “the real” number. The actual amount of time it took to wait for 10 000 disk syncs.

I specifically choose 10 000 for the loop number, so that I can easily measure the average time to commit. This number however, will not be pure as there’s some extra time for the call to return back to sqlplus, but it’s close enough.

I got 4.6 seconds. Doing the math, this means a single disk sync (or SAN sync should I say) took 0.46 ms. How much did you get? Please post your results along with a quick overview of the system you ran this on in the comments to this post!

For the record, this is LINUX Red Hat 4, with a 3.8 Ghz XEON with 2mb cpu cache with 64 bit Oracle. The redo is on a dedicated ASM diskgroup, using a single ASMLIB disk, which is a raid 1 (mirror) array of 2×36 gb hard drives, 15k RPM inside an IBM SAN, with 1gb of mirrored cache (512mb effective) with 2gbit fibre channel adapters. The SAN setting for “write-back cache” is on.

Now here’s the strange thing. I found that any other disk IO activity on the machine will dramatically increase your commit time. It doesn’t matter if it’s on a separate spindle/device/LUN. The time to complete the operation is so short that it is affected by the smallest change in the working conditions. In fact it is so small that I’m pretty sure just taking a deep breath will probably affect it. This is with the default IO scheduler. I havent tried any of the fancy ones.

I noticed this behaviour during archiving. I actually have 2 pairs (4 drives) with 4 redo groups on alternating pairs. So when I write to side A, side B can be archived and vice versa. So no concurrent IO on the redo spindles whatsoever.

If you’re wondering the time difference is about 30% in my case.

Running multiple sessions at the same time, and monitoring “service time” and “average wait time” with iostat deserve a separate post.

7 Responses to “Quantifying Commit Time”

  • Interesting

    On my windows XP laptop (single ide 5400rpm disk – (details at seagate) and 1gb ram) 10.2.0.1 gets the following results

    SQL> set timing on
    SQL> set autotrace on statistics
    SQL> declare i number;
    2 begin
    3 for i in 1..10000 loop
    4 insert into c_test values (55);
    5 commit;
    6 end loop;
    7 end;
    8 /
    PL/SQL procedure successfully completed.Elapsed: 00:00:01.16
    SQL> declare i number;
    2 begin
    3 for i in 1..10000 loop
    4 insert into c_test values (55);
    5 commit write wait;
    6 end loop;
    7 end;
    8 /
    PL/SQL procedure successfully completed.Elapsed: 00:00:03.03
    SQL>
    I don't have 10.2 on a SAN anywhere (and commit write nowait/wait is a 10.2 thing I think not a 10g thing).
    I don't have 10.2 on a SAN anywhere (and commit write nowait/wait is a 10.2 thing I think not a 10g thing).

    I don't have 10.2 on a SAN anywhere (and commit write nowait/wait is a 10.2 thing I think not a 10g thing).

    I don't have 10.2 on a SAN anywhere (and commit write nowait/wait is a 10.2 thing I think not a 10g thing).

    I don’t have 10.2 on a SAN anywhere (and commit write nowait/wait is a 10.2 thing I think not a 10g thing).

  • Christo Kutrovsky says:

    Based on the hard disk info you gave me, these numbers make no sense. There has to be caching involved, and the commit is not a true commit.

    The disk rotation is 5400 rpm. Thats 60 / 5400 = 0.01111 sec per rotation. This means, if the heads are in position, it could still take up to 11 ms before they can write. On the data sheet (the pdf on the link you gave me) it sais “Average Latency (msec) 5.6″. Which is exactly half the rotation latency, so it makes sense.

    Your commit latency: 0.3 ms
    Your HDD’s average latency: 5.6 ms

    The data is been cached somewhere. Most likelly it’s the HDD’s cache. Check this article:

    http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-disk.html

    Section 11.12.1.5.

    “….The problem is that IDE drives lie about when a write completes. With IDE write caching turned on, IDE hard drives not only write data to disk out of order, but will sometimes delay writing some blocks indefinitely when under heavy disk loads….”

    You can try turning off write caching from the device manager, rebooting and retrying the same test.

    Properties on my computer
    hardware tab
    device manager button
    open Disk drives tree
    double click your drive
    one of the tabs will have a “write caching” checkbox.

  • Mathew Butler says:

    In terms of ongoing commit performance ( ie: steady throughput ), if the initiator of a redo log buffer write is 1/3 full, 3s or 1 MB. Do you agree that the best size for the redo log might be 3 times the maximum write I/O size for the operating system? In this case, the DB only needs to wait for a single I/O to finish for the commit to complete.

    I’m interested in your thoughts.

  • Christo Kutrovsky says:

    It makes perfect sense, if all we knew about how it works was all there was.

    There is a lot of internals that we dont know about and only overhear from rumors. Such as
    - per cpu pools – redo buffer is split in multiple latches based on number of CPUs
    - private redo – a session will get it’s own region in the redo buffer for “private” use
    - redo reservation – similar to above feature
    - commit extra options – now you can “commit” in memory, without even telling log writer to flush. (commit batch nowait)

    For these reasons, it’s always good to have a larger buffer to allow these optimizations to take place.

    The worst that can happen is that this memory “is wasted” and never used. There’s other negative impacts, but we’re comparing 3 mb vs 30 mb, not < 3 mb.

    So in my opinion, just set it large and dont worry about it. I set it to 32-64 mb.

  • Jarett Kulm says:

    Slightly modified version of your script. This is on a quad SunFire with CentOS 4.3 using DEADLINE IO Scheduler. Against Oracle 10.2.0.1 64-Bit AMD64. This is against a Hitachi HDS SAN. Redo is on FC disks with EXT3, data is on SATA disks with EXT3.

    This is an active box, so there are a few stray waits showing up.

    SQL> DECLARE
    2 PROCEDURE do_loop (p_type IN VARCHAR2) AS
    3 l_start NUMBER;
    4 l_loops NUMBER := 10000;
    5 v1 NUMBER;
    6 v2 NUMBER;
    7 BEGIN
    8 EXECUTE IMMEDIATE ‘ALTER SESSION SET COMMIT_WRITE=”’ || p_type || ””;
    9 EXECUTE IMMEDIATE ‘TRUNCATE TABLE commit_test’;
    10 EXECUTE IMMEDIATE ‘select TOTAL_WAITS from v$system_event where event like ”log file sync%”’ into v1;
    11
    12 l_start := DBMS_UTILITY.get_time;
    13 FOR i IN 1 .. l_loops LOOP
    14 INSERT INTO commit_test (id, description)
    15 VALUES (i, ‘Description for ‘ || i);
    16 COMMIT;
    17 END LOOP;
    18 EXECUTE IMMEDIATE ‘select TOTAL_WAITS – ‘ || v1 || ‘ from v$system_event where event like ”log file sync%”’ into v2;
    19 DBMS_OUTPUT.put_line(RPAD(‘COMMIT_WRITE=’ || p_type, 30) || ‘: ‘ || (DBMS_UTILITY.get_time – l_start) || ‘ — TOTAL WAITS: ‘ || v2 );
    20 END;
    21 BEGIN
    22 do_loop(‘WAIT’);
    23 do_loop(‘NOWAIT’);
    24 do_loop(‘BATCH’);
    25 do_loop(‘IMMEDIATE’);
    26 do_loop(‘BATCH,WAIT’);
    27 do_loop(‘BATCH,NOWAIT’);
    28 do_loop(‘IMMEDIATE,WAIT’);
    29 do_loop(‘IMMEDIATE,NOWAIT’);
    30 END;
    31 /
    COMMIT_WRITE=WAIT : 934 — TOTAL WAITS: 10022
    COMMIT_WRITE=NOWAIT : 131 — TOTAL WAITS: 1
    COMMIT_WRITE=BATCH : 132 — TOTAL WAITS: 0
    COMMIT_WRITE=IMMEDIATE : 148 — TOTAL WAITS: 0
    COMMIT_WRITE=BATCH,WAIT : 1035 — TOTAL WAITS: 10008
    COMMIT_WRITE=BATCH,NOWAIT : 128 — TOTAL WAITS: 7
    COMMIT_WRITE=IMMEDIATE,WAIT : 1070 — TOTAL WAITS: 10012
    COMMIT_WRITE=IMMEDIATE,NOWAIT : 135 — TOTAL WAITS: 0

    PL/SQL procedure successfully completed.

    Elapsed: 00:00:37.84
    SQL> /
    COMMIT_WRITE=WAIT : 1137 — TOTAL WAITS: 10020
    COMMIT_WRITE=NOWAIT : 130 — TOTAL WAITS: 1
    COMMIT_WRITE=BATCH : 126 — TOTAL WAITS: 0
    COMMIT_WRITE=IMMEDIATE : 132 — TOTAL WAITS: 0
    COMMIT_WRITE=BATCH,WAIT : 1049 — TOTAL WAITS: 10019
    COMMIT_WRITE=BATCH,NOWAIT : 126 — TOTAL WAITS: 2
    COMMIT_WRITE=IMMEDIATE,WAIT : 1041 — TOTAL WAITS: 10013
    COMMIT_WRITE=IMMEDIATE,NOWAIT : 131 — TOTAL WAITS: 6

    PL/SQL procedure successfully completed.

    Elapsed: 00:00:39.70
    SQL> /
    COMMIT_WRITE=WAIT : 945 — TOTAL WAITS: 10001
    COMMIT_WRITE=NOWAIT : 144 — TOTAL WAITS: 0
    COMMIT_WRITE=BATCH : 127 — TOTAL WAITS: 4
    COMMIT_WRITE=IMMEDIATE : 130 — TOTAL WAITS: 4
    COMMIT_WRITE=BATCH,WAIT : 962 — TOTAL WAITS: 10016
    COMMIT_WRITE=BATCH,NOWAIT : 126 — TOTAL WAITS: 0
    COMMIT_WRITE=IMMEDIATE,WAIT : 998 — TOTAL WAITS: 10021
    COMMIT_WRITE=IMMEDIATE,NOWAIT : 145 — TOTAL WAITS: 0

    PL/SQL procedure successfully completed.
    Elapsed: 00:00:41.58

  • Christo Kutrovsky says:

    Interesting. Your results differ significantly from the different executions. I assume the SAN is not used for only that ? Are you sure you have dedicated disks ?

    Also your times for “commit wait” are twice as much compared to mine. And we’re both fiber channel.

  • [...] risultati di un’altro test analogo si trovano in un commento ad un post di Christo Kutrovsky su pythian e sono [...]

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>