Adaptive Log File Sync: Oracle, Please Don’t Do That Again

Oct 19, 2012 / By Jeremy Schneider

Tags: , , , ,

Disclaimer: Much of what follows is pure speculation on my part. It could be completely wrong, and I’m putting it out there in the hopes that it’ll eventually be proven one way or the other.

The Summary

  • Underscore parameter _use_adaptive_log_file_sync
    • Default value changed in 11.2.0.3 from FALSE to TRUE
    • Dynamic parameter
  • Enables a new method of communication for LGWR to notify foreground processes of commit
    • Old method used semaphores, LGWR had to explicitly “post” every waiting process
    • New method has the FG processes sleep and “poll” to see if commit is complete
    • Advantage is to free LGWR from CPU work required to inform lots of processes about commits
  • LGWR dynamically switches between old and new method based on load and responsiveness
    • Method can switch frequently at runtime, max frequency is 3 switches per minute (configurable)
    • Switch is logged in LGWR tracefile, we have seen several switches per day
  • Few problems in general, possible issues seem to be in RAC and/or the switching process itself

The Story

We’re working with a customer who had a very successful Exadata go-live last weekend. They moved a large application mostly unchanged from 9i to Exadata – and we’ve generally had very positive feedback. (!!) It might be that just getting on modern hardware and software accounts for this, but it still gives everyone a good feeling.

Nevertheless, there are always a few little adventures. One of these was a mysterious orange glob on the Cloud Control radar Monday at 1am. Orange is the “commit” class – so this was a little surprising! (And reminiscent of Karl Arao’s famous halloween monster!) A few drill-down clicks and we’re looking at “log file sync” events.  To make a long story short, it wasn’t any of the usual suspects, and the necessary diagnostics info wasn’t available after the fact to say what happened with certainty. We’ve done a fairly comprehensive incident report and it hasn’t recurred… So I’m happy for now.

Anyway, I haven’t gotten to the really fun part yet. The really fun part is the #everydaylearning as Yury would say. In the process of analyzing the orange glob, I ended up investigating whether it might be related to a new feature called Adaptive Log File Sync.  I’d never heard of this before so I had quite a bit of fun learning, and it seemed worthwhile to share a few things I found.

Before I dive in, one other important thing: I have to mention Christo Kutrovsky. We were both on-site and digging into this feature at the same time. There was a constant exchange of competing theories and different ideas about how things might work. Together we assembled a strong, in-depth, and working theory very, very quickly – I love working on a great team.

Log File Sync

First off, there are a few basic starting points for anyone troubleshooting time spent in the log file sync event.

In my specific case, I discovered quickly from the ASH data that this was a very unusual situation. This led to some more creative searching in the Oracle Support KnowledgeBase – at which point I discovered something very interesting:

Adaptive Log File Sync

Note 1462942.1 describes a feature whereby LGWR can switch between log write methods.  This feature is enabled through an underscore parameter called _use_adaptive_log_file_sync and the description of this parameter is: adaptively switch between post/wait and polling.

Now I’ll be honest: searching on the internet and searching in Oracle Support’s KnowledgeBase for “adaptive log file sync” yields almost nothing. I do believe in the principle of BAAG – but this is a case where some guesswork might be useful, especially to guide experimentation that could nail down more concrete answers. Hence the disclaimer at the top of this article.

The words “post” and “wait” indicate that we’re talking about semaphores.  For some general background on semaphores, check out the Wikipedia article and for more detail about the unix post() and wait() calls, a good resource is chapter 30 from Andrea Arpaci-Dusseau’s (UW Madison) textbook on Operating Systems.

Oracle uses semaphores extensively. In fact, LGWR and commits are specifically mentioned as an example in the Performance Tuning Guide – and Oracle even has an API to replace semaphore use with a third-party driver for lightweight post-wait implementations. (Like an ODM for your OS kernel.)

If you look for references to “commit” in the Oracle docs, you’ll find the word “post” everywhere when they talk about communication between the foreground processes and LGWR.  Now, remember that a COMMIT has two options: first, IMMEDIATE or BATCH and second, WAIT or NOWAIT.  It looks like this to me:

  • Immediate: FG process will post to LGWR, triggering I/O  (default)
  • Batch: FG process will not post LGWR
  • Wait: LGWR will post FG process when I/O is complete  (default)
  • NoWait: LGWR will not post FG process when I/O is complete

Why change a good thing? My theory is that it ties back to a rare problem Riyaj mentioned in that blog post over 4 years ago.  I’ll quote him:

“LGWR is unable to post the processes fast enough, due to excessive commits. It is quite possible that there is no starvation for CPU or memory, and that I/O performance is decent enough. Still, if there are excessive commits, LGWR has to perform many writes/semctl calls, and this can increase ‘log file sync’ waits. This can also result in sharp increase in redo wastage statistics.”

Maybe Oracle is reading Riyaj’s blog? It appears that they came up with a new algorithm where LGWR doesn’t post. My guess: foreground processes can probably still post LGWR – but LGWR never posts back. Instead, foreground processes in WAIT mode “poll” either a memory structure or LGWR itself. It could be an in-house implementation by Oracle, it could still use semaphores, or it could use message queues somehow [seems like a long shot but the unix poll() call is found there].

There’s one interesting challenge that I can think of in implementing this. With the semaphore approach, all the active commits are *sleeping* (off the CPU) while LGWR flushes the log buffer.  There could be dozens or hundreds of foreground processes simultaneously commiting on a very busy system.  If we switch to a polling method, how do we ensure that these hundreds of processes don’t start spinning and steal CPU from LGWR, making the whole system even worse than it was in the beginning?

The answer might lie in a quick search for “adaptive_log_file_sync” from the underscore parameters. There are five more hidden parameters with that string:

select a.ksppinm name, b.ksppstvl value, a.ksppdesc description
from sys.x$ksppi a, sys.x$ksppcv b
where a.indx = b.indx and a.ksppinm like '\_%adaptive\_log%' escape '\'
order by name
Name Default Description
_adaptive_log_file_sync_high_switch_freq_threshold 3 Threshold for frequent log file sync mode switches (per minute)
_adaptive_log_file_sync_poll_aggressiveness 0 Polling interval selection bias (conservative=0, aggressive=100)
_adaptive_log_file_sync_sched_delay_window 60 Window (in seconds) for measuring average scheduling delay
_adaptive_log_file_sync_use_polling_threshold 200 Ratio of redo synch time to expected poll time as a percentage
_adaptive_log_file_sync_use_postwait_threshold 50 Percentage of foreground load from when post/wait was last used

It appears that there are a number of knobs to turn with this new algorithm – and it looks like Oracle is somehow dynamically calculating a “polling interval”.  Furthermore, it seems to be taking “scheduling delay” into account.

It also seems that by default, we’re limited to switching modes every 20 seconds (3 per minute).  This switching seems to be controlled by “thresholds” – and the threshold for enabling the new polling mode seems to be based on time.  From the default percentage, it looks to me like Oracle won’t switch until it thinks the poll time will be less than half of the current post time.  For switching back it also seems to be another “half” percentage (50), though I’m not sure what “foreground load” might mean.

Now there’s one other underscore parameter which I noticed while looking for things related to log file sync:

Name Default Description
_fg_sync_sleep_usecs 0 Log file sync via usleep

This parameter is interesting for one reason: The adaptive sync parameters were introduced in 11g, but this parameter was introduced in 10g. And if you think about the name, it actually sounds very similar to the “polling” strategy and not like a semaphore strategy at all! Foreground processes call usleep() during a log sync – if you’re calling wait() then you don’t need to sleep. But if you’re polling, you definitely need to sleep. Maybe Oracle has been working on this idea since 10g? I wouldn’t put it past them. :)

So how do you know if you’re using this feature?  The most obvious sign will be in the LGWR trace file.

There will be messages each time that a switch happens, looking something like this:

*** 2012-10-16 01:47:50.289
kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=8 switch_sync_count_delta=59

*** 2012-10-16 01:47:50.289
Log file sync switching to post/wait
Current approximate redo synch write rate is 2 per sec

*** 2012-10-16 02:51:19.285
kcrfw_update_adaptive_sync_mode: post->poll long#=51 sync#=352 sync=4600 poll=1061 rw=500 rw+=500 ack=7 min_sleep=1061

*** 2012-10-16 02:51:19.285
Log file sync switching to polling
Current scheduling delay is 1 usec
Current approximate redo synch write rate is 117 per sec

I mentioned that the underscore parameters were introduced in 11g; however, my current understanding is that only in 11.2.0.3 did the default value of _use_adaptive_log_file_sync change to true!! Is it comforting that they waited so long, or is it scary that they’d make this change with no available documentation or troubleshooting information?  I guess it depends on your perspective. Here’s mine:

Explanation Please?

  • This is a change of critical, core code that impacted every single database installation and upgrade.
  • I think it is a strange departure from Oracle’s normal practices to change such important code with absolutely no announcement, documentation, or technical troubleshooting information even in their customer-only support database.
  • In my opinion this requires explanation; they can do (and have done) much better.

Regardless, it’s now been 12 months since 11.2.0.3 was released. At present, it’s only mentioned in a single support article. This offers at least some evidence that adaptive log file sync hasn’t caused widespread panic.  For the past year, many of you have been using this feature without even knowing – and there weren’t enough problems to merit even two Oracle Support notes.

I sent a few emails out to friends who also tend to work on large, loaded systems. I wasn’t the first person to stumble across this – James Morle mentioned it in a tweet a few months ago, and (not surprisingly) Riyaj has stumbled across the feature as well. It seems to me that the issues that have been encountered intersect with RAC and/or the switching process itself. But my informal survey was hardly scientific.

Overall, I think Oracle dodged the bullet this time. But they still made an important change without supporting documentation. To my friends inside the big red mothership: Please don’t do that again! We like it better the other way.

Have you ever heard of this new 11.2.0.3 feature? Have you heard of the underscore parameter? I’d love to hear your stories!

8 Responses to “Adaptive Log File Sync: Oracle, Please Don’t Do That Again”

  • [...] Jeremy Schneider: “Adaptive Log File Sync: Oracle, Please Don’t Do That Again” [...]

  • Chuan Wang says:

    Nice Article.
    But I doubt Oracle will stop doing this – remember the “famous” _complex_view_merging? They changed it back and forth a few times without too much announcement and documentation.
    They have done it before, are doing it now and will probably do it later.

    Cheers,

    Chuan

  • Elina Tuominen says:

    Thanks for this information !
    I have a situation, that after upgrading from 11.2.0.2 to 11.2.0.3 on extended rac cluster (two nodes), the other node is facing sometimes over >100s (!!) log file sync. The other node is working ok. There are also other databases in the same environment and they are working ok. There is no log writer delays at the same time. Nobody finds anything wrong the environment and I have been starting to wonder about this Adaptive log file sync. The other node (which does not face any problems) does not have this pooling on currently. The other one has is on (according to lgwr-file). Have you been facing same kind of problems in rac ? I am thinking of taking it off, but I’m still waiting an answer from Oracle support. This does happen maybe once a day or once a week. Really bad feature, if this can cause this problem.

  • @Elina – if adaptive log file sync was the cause of your problem then you would see a message in the lgwr trace file indicating that the sync method changed at the exact time when you began seeing >100s log file sync events. you could verify exactly when the log file sync events started by looking in the ASH. in our case the time in the trace file did match exactly with the start time of our hung sessions.

  • valery says:

    Thank you, this article has helped to solve our problems.

  • Verified, SR-backed issue of this in two databases upgraded from 11.2.0.2 to 11.2.0.3 apprx two months ago. It can start “suddenly” as in one night during/after a large data load, log file sync jumped from 3ms to 11ms and stayed there, consistently, until we restarted the database days later. Perfect example, high log file sync, low log file parallel write, pwrites() in dtruss lined up perfectly with log file parallel writes, and no change to log file parallel write time when log file sync spiked up to its ‘new normal’. A database restart fixes it, but as this Oracle doc you quoted (and the Oracle engineer gave us – 1462942.1) says, it comes back. The bullet points at the beginning of that doc were a 100% match for our experience. We had the Solaris kernel folks engaged as well, they were convinced it was memory fragmentation, then Oracle finally suggested this after days of round and round about non-matching ‘fixes’. Our tech said it is fixed in 11.2.0.4. The doc says 11.2.0.3.1, we’re on 11.2.0.3.7, Solaris 10.

    Thanks for putting this out there. I agree, some warning would have been nice.

  • sudershan virdi says:

    Finally hit this issue on 11.2.0.3, lgwr trace file clearly shows:
    *** 2014-05-08 08:17:18.377
    Log file sync switching to polling
    Current scheduling delay is 1 usec
    Current approximate redo synch write rate is 158 per sec

    Did you guys switch “_use_adaptive_log_file_sync” = FALSE; as described in Doc ID 1462942.1?

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>