THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Database tuning: ratio vs. rate

Baron makes an excellent point in Why you should ignore MySQL’s key cache hit ratio — ratio is not the same as rate. Furthermore, rate is [often] the important thing to look at.

This is something that, at Pythian, we internalized a long time ago when thinking about MySQL tuning. In fact, mysqltuner 2.0 takes this into account, and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. Read the rest of this entry . . .

Pythian Offers Customized Training/Consulting Package

Yesterday, The Pythian Group issued a press release about my book, Pythian’s partnership with Sun, and our new “MySQL Adoption Accelerator Package”. I am not a marketing guru, but I can tell you what we the package means in terms of new work that the MySQL teams have been doing.

Basically, the MySQL Adoption Accelerator Package combines customized training with a comprehensive audit of systems. The name “Adoption Accelerator” makes it sound like it’s only for new applications that are almost ready to go live. What the program actually does is have us evaluate your systems, and intensively train you in the areas you want and need. The program is designed to suit all your needs, whether it’s teaching you about one topic (say, query optimization) or an entire range of topics, from Architecture to ZFS (special issues with running MySQL on ZFS, that is, but that did not fit a cute “from A-Z” model…).

Whether you have already adopted MySQL or are thinking of converting from Oracle, DB2, Microsoft SQL Server or even sqlite, this new package may be what you need.

And now, the full text of the press release, for the curious:

‘MySQL Administrator’s Bible’ Hits the Bookstands: Pythian Launches MySQL Accelerator Adoption Package

The Pythian Group, the leading provider of remote database services, is pleased to announce that the much-anticipated MySQL Administrator’s Bible, written by employee Sheeri K. Cabral, is now available.
Read the rest of this entry . . .

What Exactly is Swappiness?

This is an issue that keeps rearing its ugly head over and over again, and since it greatly affects performance, it is most important that DBAs of any DMBS running on Linux come to grips with it. So I decided to do some research and try different settings on my notebook. Here are my findings.

What can you find on the web?

A Wikipedia search for the word swappiness will come up empty (any volunteers out there want to write an article?). A Google search will show some pretty old material—the best article I found is from 2004: Linux: Tuning Swappiness. This article includes a detailed discussion with some interesting remarks by Andrew Morton, a Linux kernel maintainer.

So, what is swappiness?

Read the rest of this entry . . .

InnoDB logfiles

The unsung heroes of InnoDB are the logfiles. They are what makes InnoDB automatic crash recovery possible.

Database administrators of other DBMS may be familiar with the concept of a “redo” log. When data is changed, affected data pages are changed in the innodb_buffer_pool. Then, the change is written to the redo log, which in MySQL is the InnoDB logfile (ib_logfile0 and ib_logfile1). The pages are marked as “dirty”, and eventually get flushed and written to disk.

If MySQL crashes, there may be data that is changed that has not been written to disk. Those data pages were marked as “dirty” in the innodb_buffer_pool, but after a MySQL crash the innodb_buffer_pool no longer exists. However, they were written to the redo log. On crash recovery, MySQL can read the redo log (InnoDB log files) and apply any changes that were not written to disk.

That is the basic functionality of the InnoDB log files. Given this, let’s look at some of the different parameters and their ramifications.

innodb_log_files_in_group is set with a default of 2. The logfiles are written in a circular manner — ib_logfile0 is written first, and when it has reached its maximum size, then ib_logfile1 will be written to.

innodb_log_file_size is the size of each log file in the log group. The total, combined size of all the log files has to be less than 4 Gb (according to the MySQL manual). Because the logfiles contain changes in the buffer pool that have not been written to disk, the total, combined size of all the log files should not be more than the innodb_buffer_pool_size.

If all the log files in the group are full of changes that have not been written to disk, MySQL will start to flush dirty pages from the InnoDB buffer pool, writing the changes to disk. If the log files are small, changes will be written to disk more often, which can cause more disk I/O.

When InnoDB does a crash recovery, it reads the log files. If the log files are large, it will take longer to recover from a crash. If innodb_fast_shutdown is set to 0, the log files are purged when MySQL shuts down — larger files mean a longer shutdown time. The default for innodb_fast_shutdown is 1, which means that the log files are not purged before a shutdown. Starting in MySQL 5.0.5, you can set it to 2, which simulates a crash, and at the next startup InnoDB will do a crash recovery.

innodb_flush_log_at_trx_commit controls how often the log files are written to. A value of 0 causes the log files to be written and flushed to disk once per second. The default is 1, which causes the log buffer to be written and flushed to disk after every transaction commit. The value can also be set to 2, which causes the log buffer to be written after every transaction commit and flushes the log files to disk once per second. A value of 2 means that MySQL might think that some changes are written to the log file, but do not persist in the log file after an operating system crash, because the log file was not flushed to disk before a crash.

Note that some filesystems are not honest about flushing to disk, so even though you may have the default value of 1, your system may be acting as if it has a value of 2. Setting this parameter to 2 means that there will be less I/O, at the cost of not being able to recover data from a crash.

innodb_flush_method changes how InnoDB opens and flushes data and log files. See the manual for details; the end result is a tradeoff in I/O performance versus whether or not an operating system crash would leave the InnoDB log files in an inconsistent state.

innodb_log_buffer_size is the write buffer for InnoDB log files. The larger the buffer is, the less often the log files are written to. This can save I/O.

Consistent Gets not Necessarily the Best Way to Look at Query Performance

This post is for those who think Consistent Gets is the only thing that matters. It’s not. That’s why Statspack and AWR provide not only the top queries sorted by Consistent Gets but also Sorted by IO, CPU, Cluster Waits, and so on. I won’t argue. Check for yourself.

I’ve run the queries that follow on top of 10.2.0.3 on Linux X86_64.

Sample Table

Create and Fill up a table to run your queries. You’ll find the script you need below:

create table X1(a number,b number);

begin
   for i in 1..1000000 loop
      insert into X1 values (i,mod(i,100000));
   end loop;
end;
/

commit;

exec dbms_stats.gather_table_stats(user, 'X1');

Case 1: 4164 Consistent Gets for 0.14 seconds

First, let’s assume that a few Consistent Gets means good performance. Look at the following query:

Read the rest of this entry . . .

Tuning Latch Contention: Cache-buffers-chain latches

Recently, I had an opportunity to tune latch contention for cache buffers chain (CBC) latches. The problem was high CPU-usage combined with poor application performance. A quick review of the statspack report for 15 minutes showed a latch-free wait as the top event, consuming approximately 3600 seconds in an 8-CPU server. CPU usage was quite high, which is a typical symptom of latch contention, due to the spinning involved. v$session_wait showed that hundreds of sessions were waiting for latch free event.

SQL> @waits10g

   SID PID     EVENT         P1_P2_P3_TEXT
------ ------- ------------  --------------------------------------
   294  17189  latch free    address 15873156640-number 127-tries 0
   628  17187  latch free    address 15873156640-number 127-tries 0
....
   343  17191  latch free    address 15873156640-number 127-tries 0
   599  17199  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
   337  17214  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
.....
   695  17228  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
....
   276  15153  latch: cache  address 19878655176-number 122-tries 1
               buffers chains

I will use a two-pronged approach to find the root cause scientifically. First, I’ll find the SQL suffering from latch contention and objects associated with the access plan for that SQL. Next,I will find the buffers involved in latch contention, and map that back to objects. Finally, I will match these two techniques to pinpoint the root cause.

Before I go any further, let’s do a quick summary of the internals of latch operations.

Read the rest of this entry . . .

Tuning ‘log file sync’ Event Waits

In this blog entry, I will discuss strategies and techniques to resolve ‘log file sync’ waits. This entry is intended to show an approach based upon scientific principles, not necessarily a step-by-step guide. Let’s understand how LGWR is inherent in implementing the commit mechanism first.

Commit Mechanism and LGWR Internals

At commit time, a process creates a redo record (containing commit opcodes) and copies that redo record into the log buffer. Then, that process signals LGWR to write the contents of log buffer. LGWR writes from the log buffer to the log file and signals user process back completing a commit. A commit is considered successful after the LGWR write is successful.

Of course, there are minor deviations from this general concept, such as latching, commits from a plsql block, or IMU-based commit generation, etc. But the general philosophy remains the same.

Signals, Semaphores and LGWR

The following section introduces the internal workings of commit and LGWR interation in Unix platforms. There are minor implementation differences between a few Unix flavours or platforms like NT/XP, for example the use of post-wait drivers instead of semaphores etc. This section will introduce, but not necessarily dive deep into, internals. I used truss to trace LGWR and user process. The command is:

truss -rall -wall -fall -vall -d -o /tmp/truss.log -p 22459

(A word of caution: don’t truss LGWR or any background process unless it is absolutely necessary. Doing so can cause performance issues, or worse, shutdown the database.)

Read the rest of this entry . . .

Welcome ASH Masters!

Welcome ASH Masters!

I have already mentioned about the excellent work that Kyle Hailey did around Active Session History (ASH). Kyle has also created ASHMON.

The latest news — there is a new web site — ashmasters.com. This is the place where you can leave your comments and questions about ASH and ASHMON. Wondering how you can query ASH data? There are some ready to use queries. Have a cool idea how to use ASH and query ASH data? Share it there and have it added to the ASH Masters queries toolkit.

I hear, some of you say – “Right… ASH… 10g… Diagnostic Pack…” No panic, there is poor man’s ASH — ASH Simulation. This is the way to get some benefits of ASH while still running Oracle 9i or Oracle 10g without Diagnostic Pack.

Sounds exciting? It does, indeed!

EXPLAIN Cheatsheet

At the 2008 MySQL Conference and Expo, The Pythian Group gave away EXPLAIN cheatsheets. They were very nice, printed in full color and laminated to ensure you can spill your coffee* on it and it will survive.

For those not at the conference, or those that want to make more, the file is downloadable as a 136Kb PDF at explain-diagram.pdf

* or tea, for those of us in the civilized world.

Good Database Design is Mightier than Hardware

Have you ever heard the one about throwing hardware at a software problem?

In one of my previous blog posts, I mentioned something along the lines of—well I’ll just cut and paste . . .

In my experience, the solution to most problems (the ones the caller refers to as “it’s running slow”) are not rooted in hardware, because hardware problems generally cause things to not run at all. It always baffles me when developers and architects prescribe hardware upgrades to make things run faster, because about 80% of the performance-related problems and subsequent solutions I’ve dealt with were resolved by tuning the application.

Well yes, I know you can buy new hardware, and it’s easier. But when it comes to hardware, how many of you have a ten-node RAC cluster running Enterprise Edition with 8GB of RAM on each node, running off a massive SAN?

I’ve been on so many systems that have been running for years–poorly–the way they were, and in a week we can take them apart and have them running without a hitch. We’ve even managed to fix problems that turned out to be the business case to go from RAC back down to a single instance. How much did those customers save on licensing costs?

Back to the example at hand. I have this nifty RAC system that supports some very public and very mission-critical apps, and one day (it was Sunday night) it starts choking. We’re getting enqueues. Slowly they start climbing. Ten nodes came to a crashing halt. I have now seen a ten-node RAC cluster come to crashing halt and completely lock up.

Why, you ask? Read the rest of this entry . . .

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @sheeri: #confoo talk "Bending Queries to your Will with EXPLAIN" slides http://bit.ly/explainslides & handout
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more