Posted by Danil Zburivsky on Dec 13, 2011
I had to refresh my knowledge on how InnoDB threads queue works the other day when debugging activity spikes on one of the customer’s production system and while I had general idea about InnoDB kernel and queue, thread concurrency and queue join delays I didn’t have a complete model of how InnoDB concurrency control works. This is what I knew before I started investigation:
- You can limit the number of threads that allowed to be executed inside InnoDB kernel with innodb-thread-concurrency
- If all slots in kernel are occupied other threads have to wait in the queue
- Thread will sleep for time specified by innodb-thread-sleep-delay before entering the queue
I also knew that there is variable innodb-concurrency-tickets and that it allows a thread to enter the InnoDB kernel several times before it has to wait in the queue again. But how actually it happens? Why would thread need to leave the kernel? Is it related to transactions that run multiple queries? This is something I didn’t know. Read the rest of this entry . . .
Posted by Danil Zburivsky on Sep 1, 2011
MySQL Replication is a powerful tool and it’s hard to find a production system not using it. On the other hand debugging replication issues can be very hard and time consuming. Especially if your replication setup is not straightforward and you are using filtering of some kind.
Read the rest of this entry . . .
Posted by Danil Zburivsky on May 31, 2011
I was reading a brilliant book Relational Database Index Design and the Optimizers by Tapio Lahdenmaki and Mike Leach. At the end of one of the chapters I came across the exercise to design two indexes for a given query: one to minimize the number of index rows scanned and second one to eliminate sorting.
Using algorithm described in the book I quickly came up with two indexes and while first one looked fine, I was really confused by the second one for the elimination of the sort. Let me show an example, not copy one from the book, but rather show a test I did with MySQL.
Read the rest of this entry . . .
Posted by Danil Zburivsky on Jan 27, 2011
A couple of days ago I was reading a paper Paxos Made Live – An Engineering Perspective written by Google engineers. It is an interesting reading about implementation of Paxos algorithm for building a fault-tolerant database. But one paragraph made me think I am reading something very familiar:
We decided to err on the side of caution and to rollback our system to the old version of Chubby (based on 3DB) in one of our data centers. At that point, the rollback mechanism was not properly documented (because we never expected to use it), its use was non-intuitive, the operator performing the roll-back had no experience with it, and when the rollback was performed, no member of the development team was present. As a result, an old snapshot was accidentally used for the rollback. By the time we discovered the error, we had lost 15 hours of data and several key datasets had to be rebuilt.
This really looked like one of the incident notification we at Pythian send to our customers in case of production outage or any other significant issue. Don’t get me wrong, I am not saying: “Look, big guys, like Google, have problems too!”. The point here is that when you manage production environment of any scale, whether it is a multi-terabyte heavily loaded system, or a “one database” website, you face similar organizational problems. This short paragraph points to some very important questions you should be asking yourself everyday if you are in charge of a production system.
Are all of your standard procedures, like backup/restore well documented? You never know who will be dealing with issues when thunder strikes.
Do you have a proper escalation procedures, so every production support team member knows where to look for help, in case he is stuck or in doubt?
Do you crosscheck work done by others? This can help you catch things like wrong backup used for restore, suggest a way to improve one’s work, or learn something new from your colleagues and make existing process better.
Stop trusting your own procedures. Test and verify them from time to time. Things tend to change, sometimes unnoticed. So if you haven’t tried to restore your backup for 3 months, you can’t really be sure it works.
Posted by Danil Zburivsky on Jan 12, 2011
Today I’ve spent some time (more than this issue was worth, actually) on a client’s system trying to
find out why table was not accessible and failed with the following error:
mysql> SHOW CREATE TABLE test_table;
ERROR 1033 (HY000): Incorrect information in file: './test/test_table.frm'
Read the rest of this entry . . .
Posted by Danil Zburivsky on Oct 9, 2008
Quite often we need to perform a so-called “MySQL instance audit”. This common DBA procedure should give you a general view of the MySQL environment. You may be interested in a basic understanding of what kind of operation MySQL performs, how much memory does it use, or how well does it look from the performance point of view. There is no easy out-of-the-box way to do such an audit on a MySQL server. You can use SHOW STATUS and check the list of system variables, but this way can hardly be called DBA-friendly.
Fortunately there are several tools to make this process easier. Among most popular are mysqlreport and MySQLTuner. In this post I’d like to give a brief overview of MySQLTuner.
So, what can MySQLTuner do? Quoting the documentation: “MySQLTuner is a script written in Perl that will assist you with your MySQL configuration and make recommendations for increased performance and stability. Within seconds, it will display statistics about your MySQL installation and the areas where it can be improved.”
It’s not magic — they don’t use any “hidden” or unknown MySQL features to provide the report. What they do is use SHOW STATUS metrics and provide a user-friendly report, interpreting data this or that way. What makes such tools really valuable is the way they interpret that data.
Read the rest of this entry . . .