Posts Tagged ‘tuning’

Tuning Latch Contention: Cache-buffers-chain latches

By Riyaj Shamsudeen July 25th, 2008 at 3:24 pm
Posted in Oracle
Tags:

Recently, I had an opportunity to tune latch contention for cache buffers chain (CBC) latches. The problem was high CPU-usage combined with poor application performance. A quick review of the statspack report for 15 minutes showed a latch-free wait as the top event, consuming approximately 3600 seconds in an 8-CPU server. CPU usage was quite high, which is a typical symptom of latch contention, due to the spinning involved. v$session_wait showed that hundreds of sessions were waiting for latch free event.

SQL> @waits10g

   SID PID     EVENT         P1_P2_P3_TEXT
------ ------- ------------  --------------------------------------
   294  17189  latch free    address 15873156640-number 127-tries 0
   628  17187  latch free    address 15873156640-number 127-tries 0
....
   343  17191  latch free    address 15873156640-number 127-tries 0
   599  17199  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
   337  17214  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
.....
   695  17228  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
....
   276  15153  latch: cache  address 19878655176-number 122-tries 1
               buffers chains

I will use a two-pronged approach to find the root cause scientifically. First, I’ll find the SQL suffering from latch contention and objects associated with the access plan for that SQL. Next,I will find the buffers involved in latch contention, and map that back to objects. Finally, I will match these two techniques to pinpoint the root cause.

Before I go any further, let’s do a quick summary of the internals of latch operations.

(more…)

Tuning ‘log file sync’ Event Waits

By Riyaj Shamsudeen June 27th, 2008 at 4:26 pm
Posted in Oracle
Tags:

In this blog entry, I will discuss strategies and techniques to resolve ‘log file sync’ waits. This entry is intended to show an approach based upon scientific principles, not necessarily a step-by-step guide. Let’s understand how LGWR is inherent in implementing the commit mechanism first.

Commit Mechanism and LGWR Internals

At commit time, a process creates a redo record (containing commit opcodes) and copies that redo record into the log buffer. Then, that process signals LGWR to write the contents of log buffer. LGWR writes from the log buffer to the log file and signals user process back completing a commit. A commit is considered successful after the LGWR write is successful.

Of course, there are minor deviations from this general concept, such as latching, commits from a plsql block, or IMU-based commit generation, etc. But the general philosophy remains the same.

Signals, Semaphores and LGWR

The following section introduces the internal workings of commit and LGWR interation in Unix platforms. There are minor implementation differences between a few Unix flavours or platforms like NT/XP, for example the use of post-wait drivers instead of semaphores etc. This section will introduce, but not necessarily dive deep into, internals. I used truss to trace LGWR and user process. The command is:

truss -rall -wall -fall -vall -d -o /tmp/truss.log -p 22459

(A word of caution: don’t truss LGWR or any background process unless it is absolutely necessary. Doing so can cause performance issues, or worse, shutdown the database.)

(more…)

Welcome ASH Masters!

By Alex Gorbachev May 21st, 2008 at 9:27 pm
Posted in Group Blog PostsOracle
Tags:

Welcome ASH Masters!

I have already mentioned about the excellent work that Kyle Hailey did around Active Session History (ASH). Kyle has also created ASHMON.

The latest news — there is a new web site — ashmasters.com. This is the place where you can leave your comments and questions about ASH and ASHMON. Wondering how you can query ASH data? There are some ready to use queries. Have a cool idea how to use ASH and query ASH data? Share it there and have it added to the ASH Masters queries toolkit.

I hear, some of you say - “Right… ASH… 10g… Diagnostic Pack…” No panic, there is poor man’s ASH — ASH Simulation. This is the way to get some benefits of ASH while still running Oracle 9i or Oracle 10g without Diagnostic Pack.

Sounds exciting? It does, indeed!

EXPLAIN Cheatsheet

By Sheeri Cabral April 22nd, 2008 at 11:21 am
Posted in Group Blog PostsMySQLNon-Tech Articles
Tags:

At the 2008 MySQL Conference and Expo, The Pythian Group gave away EXPLAIN cheatsheets. They were very nice, printed in full color and laminated to ensure you can spill your coffee* on it and it will survive.

For those not at the conference, or those that want to make more, the file is downloadable as a 136Kb PDF at explain-diagram.pdf

* or tea, for those of us in the civilized world.

Good Database Design is Mightier than Hardware

By Shakir March 7th, 2008 at 10:03 am
Posted in Group Blog PostsOracle
Tags:

Have you ever heard the one about throwing hardware at a software problem?

In one of my previous blog posts, I mentioned something along the lines of—well I’ll just cut and paste . . .

In my experience, the solution to most problems (the ones the caller refers to as “it’s running slow”) are not rooted in hardware, because hardware problems generally cause things to not run at all. It always baffles me when developers and architects prescribe hardware upgrades to make things run faster, because about 80% of the performance-related problems and subsequent solutions I’ve dealt with were resolved by tuning the application.

Well yes, I know you can buy new hardware, and it’s easier. But when it comes to hardware, how many of you have a ten-node RAC cluster running Enterprise Edition with 8GB of RAM on each node, running off a massive SAN?

I’ve been on so many systems that have been running for years–poorly–the way they were, and in a week we can take them apart and have them running without a hitch. We’ve even managed to fix problems that turned out to be the business case to go from RAC back down to a single instance. How much did those customers save on licensing costs?

Back to the example at hand. I have this nifty RAC system that supports some very public and very mission-critical apps, and one day (it was Sunday night) it starts choking. We’re getting enqueues. Slowly they start climbing. Ten nodes came to a crashing halt. I have now seen a ten-node RAC cluster come to crashing halt and completely lock up.

Why, you ask? (more…)

Hotsos Symposium 2008 - The Before

By Alex Gorbachev March 4th, 2008 at 2:50 pm
Posted in Group Blog PostsOracle
Tags:

First of all “the before” time is over — I’m done with my presentation. It’s been the first slot of the day — 8:30 and Cary Milsap was presenting in another hall so what chances do I have to get people in? It turned out that some people actually did show up and quite a few considering the circumstances.

I have mixed feeling on the results. The presentation started very well and I managed to wake people up at the very beginning — thanks to the “equipment” I had at hands (thanks Marco and Riyaj!). You can spot one of them on the photo (thanks for the photo Marco):

(more…)