THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Pythian adds another Certified MySQL Cluster DBA

The Pythian Group added another feather to its cap today. Our Nicklas Westerlund passed his MySQL Cluster DBA Certification exam, making him part of an elite group of 57 MySQL Cluster Certified DBAs worldwide, and now two here at Pythian, the other being Augusto Bott. We have a wealth of hands-on experience setting up and maintaining the MySQL Cluster. Nick’s and Augusto’s certified creds make them our go-to guys for MySQL Clustering.

Congratulations, Nick!

MySQL: RENAME TABLE on Transactional Tables Can Jeopardize Slave Data

Do you have a master-slave MySQL set up?  Ever do DDL changes on the master?  You may be hit with a serious data integrity bug.  Read on.

One of our clients does a regular rename tables on the master to keep the current table small and archive off old data.  We’d occasionally be hit by a ‘duplicate key’ error on the slave on the current table and have to resolve it manually.  Digging into the issue, I managed to replicate it on demand and filed bug 39675 with MySQL, which subsequently has been verified and slated for fix, though from what it seems only in version 6.0.  The bug affects all versions of MySQL from 4.1 to 6.0.

In a nutshell, here is what happens. The rename tables command only checks for pending transactions or locks in the current session.  If there is a pending transaction in another session on the table being renamed, the rename will succeed, but the order in which the transaction is written to the binlog will be different from the order in which the transactions were applied on the master.  This means that the data on the slave will now be out of sync for this table.

Here’s a test-case:

Read the rest of this entry . . .

Fedora 9’s Broken Install

Fedora 9 was released on the 13th. I waited a whole three days to make sure I wasn’t going to be the beta-tester. Then I tried out the live release, and finally decided to upgrade my main workstation to Fedora 9 today. To be sure I wouldn’t mess stuff up, I used the DVD installer to upgrade.

The upgrade finished fine, but when I rebooted, XChat would not run.

[14:15:53]$ xchat
xchat: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory

What did rpm have to say about this?

[14:26:31]$ rpm -q xchat
xchat-2.8.4-11.fc8.i386

[14:26:39]$

Hmm . . .  looks like the package did not get upgraded. Wondering how many others did not get upgraded, I did a quick check:

[14:26:39]$ rpm -qa | grep fc8|wc -l
139

Wow! 139 packages still carry the fc8 tag! Next I checked if the updates repository has the updates:

[root@rajlin ~]# yum update
...

...
Transaction Summary
===================================================================
Install     18 Package(s)
Update     148 Package(s)
Remove       0 Package(s)

While I didn’t check all the packages in the list, XChat is definitely there:

xchat                   i386       1:2.8.4-15.fc9   fedora            1.3 M

So did Fedora just push out a release with old packages just to stay on schedule, and is actually releasing packages now hoping nobody would notice? Read the rest of this entry . . .

Installing Oracle on Fedora 8

Apparently somebody who reads blogs regularly found one that said it Fedora 8 bombed because he couldn’t install oracle on it. So I took on the challenge.

I have to say installing Fedora 8 was the most difficult part of the challenge. C’mon guys, after giving us Fedora 1,2,3,4,5,6 and 7, you give us this beautiful shiny new release, complete with a 2.6.23 kernel and you couldn’t write a half-decent bug-free installer? I had to abandon the upgrade from Fedora 7 on my laptop and resort to a full new install. Of course I backed up my home dir and stuff so I could quickly get back my working configs and settings and my laptop is back in business.. but I digress.

xcb_xlib.c:50: xcb_xlib_unlock: Assertion 'c->xlib.lock' failed.

Thats the problem when you try to run the oracle installer on Fedora 8. Apparently its a bug with either Java or with the X libraries or just the way they interact, with blame being shifted around. See this wonderful thread: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6532373

The workaround given is to patch the libmawt.so with sed (!). The link above has all the gory details, but how do you do it for the oracle installer? The jre is compressed and there is no libmawt.so or any other .so to be found..

Here’s how in a nutshell (this was on Fedora 8 x86_64 but I see no reason why it would not work on i386):

  • Install the required packages as for Fedora 7
  • Set up the kernel prerequisites etc. as for Fedora 7
  • Edit /etc/redhat-release to read redhat-4
  • Startup the installer. it will fail with the famous xcb_xlib.c:50: xcb_xlib_unlock: Assertion `c->xlib.lock' failed.
  • Now go to the /tmp directory and find the OraInstallXXXX directory that the installer left behind when it failed
  • Rename it to jre under /tmp (cd /tmp; mv OraInstallXXXX jre)
  • Now run this: cd /tmp/jre/jre/1.4.2/lib/i386; sed -i 's/XINERAMA/FAKEEXTN/g' libawt.so (yes, its libawt.so in this jre, not libmawt.so, it’s not a typo)
  • Go back to oracle software directory
  • Now run the installer again with the ‘patched’ jre that you just set up:
    ./runInstaller -jreLoc /tmp/jre/jre/1.4.2/

And bingo! the installer will start and you can install oracle as usual. Enjoy!

(You could also just run the installer under a different jre by changing the argument to -jreLoc, but I couldn’t find one that worked and the installer tends to be fussy, so its easiest to just stick with the oracle supplied version.)

–Raj.

Raj’s MySQL Top Five Wishlist

Taking the cue from Jay Pipes, as so many other bloggers have done, I present the five things I would most like to see in a future release of MySQL.

1. Metrics, Metrics, Metrics!

The more the better. I want to be able to debug issues on the database, but MySQL lacks good metrics on what exactly the database is doing at any given time, or what it did in the last little while.

2. Lock Trees

Again, I’d like to identify lock holders and blocking sessions on the database. Right now this is almost impossible, specifically with InnoDB.

3. Resource Usage Optimization

A query or a few queries should not be allowed to hog the whole database as they do currently. MySQL should take a long running query and push it down the priority list, giving higher priority to the smaller/faster queries. This will enable the system to still be responsive while a few bad queries are running slowly in the background.

4. Query Tracing

I’d like to be able to get an accurate idea of what resources a query actually takes — under a special debug/trace mode for instance — as opposed to just the time taken.

5. Read-Only Slaves

I would like to be able to set up slaves in a “read-only” mode in which the only thread with the ability to write would be the slave thread. This will save us a lot of headaches where developers “accidentally” run stuff on the slave instead of the master.

Read the rest of this entry . . .

mysql> set global innodb fast=true;

So you ran into some basic limitations with MyISAM when your site got busier. Even single row updates would lock the whole table and slow things down to a crawl. Then you updated to InnoDB to get the benefit of row-level locking, but now the site is even slower than before. What gives?

Well, we had exactly the same issue with one our clients, a large online community site for models and photographers. Updates were killing the system with MyISAM, and converting the tables to InnoDB got rid of the table locks but made the site slower than ever. The buffer pool hit ratio was 1000/1000 and there were no obvious bottlenecks, except that it was all excruciatingly slow.

Here’s what was happening: MySQL by default sets autocommit=1 for every connection. Which isn’t so bad for MyISAM, since a write isn’t guaranteed to disk, but for InnoDB, it has to do a flush to disk for each implicit commit. Which means each insert/update/delete on an InnoDB table will result in a flush to disk.

What’s so bad about flushes to disk? Read the rest of this entry . . .

MySQL Conference 2007 3.1

Another early start. But how could I miss Rasmus talking about PHP. The slides are there if you are interested. The history of PHP is fascinating, and it is interesting to see how it evolved. Rasmus also showed us how the MySQL conference site (run by O’Reilly) and the IBM site were vulnerable to Cross Site Scripting Vulnerabilities! With, of course, tips on how to avoid it, how to write better code, and how to debug performance issues.

Paul Tuckfield is next, with Scaling MySQL at YouTube, and it’s (surprise, surprise) MySQL with replication. And memcache. With Python (okay, that’s a surprise!). I’m not surprised because scaling with MySQL is much more than just throwing more hardware at the problem, and that’s been drilled in over and over again — with all the sites we have worked on and all the fast-growing sites out there. Some common sense I/O settings too (something I personally have pushed for in all the environments I am responsible for): cache reads only in the DB — avoid the filesystem read cache, and the RAID controller read cache. On the other hand, make sure you have a battery-backed write cache on the controller, and that it is working. Multiple spindles is a good idea, and raid 10 is the best for a database. Big stripe/chunk sizes are good, the biggest you can get. I should write a blog on db server disk setup separately, its such a huge topic… so much information, so little time!

One more thing Paul mentioned which I found intriguing was speeding up replication by predictive caching on the replica slaves. Parse the binlog outside and ahead of the slave, convert updates into selects, and run the selects on the slave to predictively cache the db blocks that are going to be updated. It sounds a little complicated and while it had the audience (and me) wow-ing it when he first mentioned it, on second thought, I’m sure there’s other variables at play, since the regular reads on the slave will also interact with these predictive reads, and it may lead to poorer read performance on the slave, even if the updates are a little faster. But an intriguing concept nonetheless, and worth a look.

Zmanda is next with their enterprise backup system based on Amanda. We use Amanda at Pythian, and it’s worked pretty well for us in the 3 years since I implemented it. I have had to recover from tape a few times and it has been quite painless. As any DBA knows, backups are the one thing that can make or break an organization, and the only time you find out how good your backups are (and if you still have your job) is when you need to use the backup. I’m vendor-agnostic as far as backup solutions go, but I do believe in regular testing and validation of backups. There’s no point in having backups that you have never tested. Oh, and RAID is not backup. Another blog in the making!

Sometimes a little downtime is a good thing — I managed to get suspend/hibernate working on my laptop during the keynotes. Just needed a little bit of tweaking. I commented out the hwclock syncs (I know those are problematic on my laptop), and I added in the ndiswrapper and nvidia modules into the list of modules to be removed before suspend/hibernate and reload after. I’ll investigate which one is the cause of the issue by working backwards, but this works for me now and I’m happy.

There is a closing keynote at 3pm, will keep you posted as usual.

–Raj

MySQL Conference 2007 2.1

Starting out bright and early again. I have a feeling Martin and Yannick might gang up on me for waking them up so early every morning and making sure we are on our way at 7am sharp, but it’s worth it. We did make it here for the keynotes.

The first session was Freedom Businesses Protect Privacy, by Eben Moglen. It was an interesting talk, but I’m not sure I share his views on the new, information-sharing technologies like YouTube. From his point of view, it is a voluntary invasion into our privacy that we’re giving these companies as they mine the data for information about us, which is then a marketable asset.

I believe that may be a minor side-effect of an otherwise great way to bring people and cultures together, to make this world a smaller place. I may be wrong and may regret all my information up there on the Web, and maybe I’m giving away something about myself by writing this blog, but I feel it is part of the process of information sharing. I look forward to an interesting debate on this.

The next talk is the Clash of the DB Egos, the egos in question being Monty “MySQL Co-Founder and CTO” Widenius, Heikki “InnoDB” Tuuri, Mikael “NDB Cluster” Ronström, Jim “Falcon” Starkey, Ari “Solid” Valtanen, Paul “Nitro” Whittington, and Mike “IBM/i5″ Smith. It is interesting to see the creators of so many different, even competing, technologies up together, and the battle is mostly benign with more than a few laughs.

Heikki made “Most Memorable Moment” with his answer to the question, what would you like to see on your wikipedia entry 20 years from now: “Wikipedia runs on InnoDB”. Monty was given the title of the first “MySQL Fellow” (in the tradition of the “IBM Fellow“) and he got a standing ovation to the sounds of dozens of laptops falling on the floor as everybody stood up to honour him!

Bdale Garbee was next with Citizenship: Open Source Community Rewards and Responsibilities. It was refreshing to hear his thoughts on how the corporate world views the Open Source Community even as they participate in the process. Indeed, HP’s contributions are not trival, but the corporate world is still trying to wrap its head around the whole new open source paradigm.

And those were the keynotes for today. Off to the technical sessions now.

–Raj.

MySQL Conference 2007 1.2

So Shirley sent me an SMS to let me know my rehearsal was scheduled, and I managed to set up my (Linux) laptop with the projector, so I’m all set. Special thanks to Lauren from O’Reilly conferences who is awesome with computers and projectors, and didn’t blink an eye when I told him we might have a bit of issue. As it turned out, we solved it quite smoothly with the NVidia set-up utility, but we did have to plug in a local monitor to fool it into setting the right resolution. For some reason, it detected the projector as a TV and only allowed me to set 640×480.

The sessions themselves are awesome, there’s just so much information out here. It’s hard to choose which session to go to and which one to miss. Oh well, I guess there’s always a next time.

The session on MySQL stored procedures was interesting, if a little dry — the original presenter could not make it and the substitute was too obviously not familiar with the presentation.

As I learned at the session on technology at Digg.com, the digg (digg.com) site uses a lot of intelligent caching to generate dynamic pages for individual users with minimal hit on the database. Interestingly enough, their slave servers have more horsepower than the master, so they know that should they have to promote a slave, they will still be in good shape. Quite a refreshing change from the usual set up where slaves/standby servers are less capable than the main db.

I had an interesting lunch discussion with Jay Pipes all about the roadmap for Falcon, InnoDB and MySQL, with some neat tools and features thrown in. Carsten has apparently written a PHP/ncurses-based command-line tool for MySQL (I will try to get you the link to it later) which can do a bunch of queries and switch between the results on the fly with tabs and such. Jay called it “vim for the MySQL CLI”. Sounds intriguing!

Meanwhile my presentation went well, I had a packed room, no standing room either. It is quite interesting that in spite of the fact that there have been a few tuning presentations already, there’s still interest in more. A quick question revealed the room was quite evenly split between DBAs and Developers..

Some of the questions I got asked:

  • How do we know if we’re getting close to using up our “buffers” — i.e. if a sort for a query is getting close to the sort_buffer_size. The short answer — we don’t. Unfortunately, MySQL has very little metrics for us to use, so the only time we will know is when we do go over the allocated buffer and it spills to disk, causing a noticeable performance drop.
  • What is the best database engine? Or, which one would I recommend? Quite frankly, thats almost like choosing a car — it depends on what you want it to do, how much you can spend, what you want out of the engine, and what appeals to you. In other words, take a good hard look at your needs, your application, the benefits and limitations of each engine and then decide.
  • The recommended InnoDB flush method on Linux: O_DIRECT
  • The way InnoDB keeps track of data also generated a lot of interest. The fact that datafiles are only written to at a checkpoint was a revelation to most people, so I’ll explain it again: all commits are synced to disk but only to the InnoDB log files. It is only at a checkpoint that the changed data is written to the data files. This reduces I/O to the data files and avoids a lot of small, random writes to the data files. However, data is never lost, since a commit does generate a write to the log file and it is synced to disk. A checkpoint is forced when the log files switch, so it is important to have a big log file size, usually 25%-50% of the InnoDB buffer pool size.

I’ll explain it a little more in detail in a separate post. Meanwhile this one is going out a little late — I got back late last night, the BoF sessions (Birds of a Feather) were very interesting and continued way past their schedule. Heikki Tuuri of Innobase was brilliant, and then the BoF on tuning just brought out the usual common pain points with MySQL and PHP.

Stay tuned!

–Raj.

Slides from My Presentation at the MySQL Conference

Here you go — slides of the presentation. I will probably blog on the presentation and any interesting questions I got, but that will have to wait till after the conference.

MySQL Conference 2007: Real World MySQL Tuning (PDF)

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @sheeri: #confoo talk "Bending Queries to your Will with EXPLAIN" slides http://bit.ly/explainslides & handout
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more