Posts Tagged ‘innodb’

Overview of Transaction Logging in MySQL

By Keith Murphy November 6th, 2008 at 3:22 pm
Posted in MySQL
Tags:

I was doing some research over the weekend on how transactions work “under the hood” in MySQL. I thought it might be enlightening if I wrote about what I found.

The database system must resolve the situation where, for whatever reason, the database server stops working in the middle of a transaction.  Perhaps the power goes out, the hard drive crashes, or the CPU disappears in a cloud of smoke. Maybe the MySQL Server itself crashes for some reason. What does the MySQL Server do when operation resumes?

Potentially, there are transactions in an inconsistent state. How are inconsistent transactions resolved? They are resolved through a recovery process involving log files that are called transactional logs. There are two kinds of transactional logs: redo logs and undo logs.

(more…)

MySQL Schema Synchronization and GUI Tools

By Keith Murphy October 21st, 2008 at 3:22 pm
Posted in MySQL
Tags:

Or, There is a Use for GUI Tools!

Recently I was working on a problem for a customer. They are converting a server with two InnoDB data files to innodb_file_per_table:

$ ls -l ibdata*
-rwxr-xr-x   1 mysql    mysql       2.0G Oct 14 13:10 ibdata1
-rwxr-xr-x   1 mysql    mysql       350G Oct 14 13:10 ibdata2

I honestly don’t recall ever seeing (or hearing about) so large a data file.

The method chosen for conversion boils down to this:

  1. stop and start the server to enable innodb_file_per_table
  2. alter all tables to myisam
  3. stop server and delete ibdata file
  4. restart server
  5. convert tables back to InnoDB
  6. add foreign keys

This post isn’t about how we did it, or about whether or not it was the best way, or anything like that. This post is really about the the last two steps.

There were over 500 InnoDB tables and something like 375 foreign keys. So I needed to somehow create the ALTER TABLE statements for these 500 servers. To drop the Foreign Keys, I used the following SELECT statement:

SELECT CONCAT( "mysql -u root -pPASS ", table_schema," -e 'ALTER TABLE ",table_name, ' DROP FOREIGN KEY ', constraint_name,';'' &') AS list_of_fks FROM information_schema.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_NAME is not null ORDER BY TABLE_NAME, COLUMN_NAME;

This created a list of statements that I could then put in a batch file, edit a little, and have them run in parallel.

I used the following SELECT statement to find all InnoDB tables and prepare a command to convert them to MyISAM:

SELECT CONCAT("mysql -u root -pPASS ", TABLE_SCHEMA, " -e 'ALTER TABLE ", TABLE_NAME, " ENGINE=MYISAM;' & ") FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='INNODB';

Again, the output can be put in a batch file, modified a little, and run in parallel.

Once everything was converted to MyISAM, I shut down the database and removed the InnoDB log and data files and modified the my.cnf to so I had file_per_table and a much smaller InnoDB data file. That was the fairly straightforward part. Now the fun began. (more…)

MySQL: RENAME TABLE on Transactional Tables Can Jeopardize Slave Data

By Raj Thukral October 8th, 2008 at 7:09 pm
Posted in MySQL
Tags:

Do you have a master-slave MySQL set up?  Ever do DDL changes on the master?  You may be hit with a serious data integrity bug.  Read on.

One of our clients does a regular rename tables on the master to keep the current table small and archive off old data.  We’d occasionally be hit by a ‘duplicate key’ error on the slave on the current table and have to resolve it manually.  Digging into the issue, I managed to replicate it on demand and filed bug 39675 with MySQL, which subsequently has been verified and slated for fix, though from what it seems only in version 6.0.  The bug affects all versions of MySQL from 4.1 to 6.0.

In a nutshell, here is what happens. The rename tables command only checks for pending transactions or locks in the current session.  If there is a pending transaction in another session on the table being renamed, the rename will succeed, but the order in which the transaction is written to the binlog will be different from the order in which the transactions were applied on the master.  This means that the data on the slave will now be out of sync for this table.

Here’s a test-case:

(more…)

InnoDB logfiles

By Sheeri Cabral October 1st, 2008 at 7:10 pm
Posted in MySQL
Tags:

The unsung heroes of InnoDB are the logfiles. They are what makes InnoDB automatic crash recovery possible.

Database administrators of other DBMS may be familiar with the concept of a “redo” log. When data is changed, affected data pages are changed in the innodb_buffer_pool. Then, the change is written to the redo log, which in MySQL is the InnoDB logfile (ib_logfile0 and ib_logfile1). The pages are marked as “dirty”, and eventually get flushed and written to disk.

If MySQL crashes, there may be data that is changed that has not been written to disk. Those data pages were marked as “dirty” in the innodb_buffer_pool, but after a MySQL crash the innodb_buffer_pool no longer exists. However, they were written to the redo log. On crash recovery, MySQL can read the redo log (InnoDB log files) and apply any changes that were not written to disk.

That is the basic functionality of the InnoDB log files. Given this, let’s look at some of the different parameters and their ramifications.

innodb_log_files_in_group is set with a default of 2. The logfiles are written in a circular manner — ib_logfile0 is written first, and when it has reached its maximum size, then ib_logfile1 will be written to.

innodb_log_file_size is the size of each log file in the log group. The total, combined size of all the log files has to be less than 4 Gb (according to the MySQL manual). Because the logfiles contain changes in the buffer pool that have not been written to disk, the total, combined size of all the log files should not be more than the innodb_buffer_pool_size.

If all the log files in the group are full of changes that have not been written to disk, MySQL will start to flush dirty pages from the InnoDB buffer pool, writing the changes to disk. If the log files are small, changes will be written to disk more often, which can cause more disk I/O.

When InnoDB does a crash recovery, it reads the log files. If the log files are large, it will take longer to recover from a crash. If innodb_fast_shutdown is set to 0, the log files are purged when MySQL shuts down — larger files mean a longer shutdown time. The default for innodb_fast_shutdown is 1, which means that the log files are not purged before a shutdown. Starting in MySQL 5.0.5, you can set it to 2, which simulates a crash, and at the next startup InnoDB will do a crash recovery.

innodb_flush_log_at_trx_commit controls how often the log files are written to. A value of 0 causes the log files to be written and flushed to disk once per second. The default is 1, which causes the log buffer to be written and flushed to disk after every transaction commit. The value can also be set to 2, which causes the log buffer to be written after every transaction commit and flushes the log files to disk once per second. A value of 2 means that MySQL might think that some changes are written to the log file, but do not persist in the log file after an operating system crash, because the log file was not flushed to disk before a crash.

Note that some filesystems are not honest about flushing to disk, so even though you may have the default value of 1, your system may be acting as if it has a value of 2. Setting this parameter to 2 means that there will be less I/O, at the cost of not being able to recover data from a crash.

innodb_flush_method changes how InnoDB opens and flushes data and log files. See the manual for details; the end result is a tradeoff in I/O performance versus whether or not an operating system crash would leave the InnoDB log files in an inconsistent state.

innodb_log_buffer_size is the write buffer for InnoDB log files. The larger the buffer is, the less often the log files are written to. This can save I/O.

Why You Want to Switch to MySQL 5.1

In two words: online operations. In a paragraph: Forget partitioning, row-based replication and events. The big reasons most people are going to salivate over 5.1, and probably start plans to upgrade now, are the online operations:

  • online ALTER TABLE for column rename, column default value change, and adding values to the end of an ENUM/SET
  • Online, table-based logging. No more need to restart your server to enable or change the general or slow query logs. You can have the standard file-based output or choose a table format…which you can query.
  • (more…)

MySQL Back to Basics: Analyze, Check, Optimize, and Repair

By Nicklas Westerlund July 25th, 2008 at 11:53 am
Posted in MySQL
Tags:

It felt like the right time for us to look back at some useful commands for table maintenance that some of us may not have mastered as much as we might like to think.

In my post about gathering index statistics, I referred to OPTIMIZE TABLE, ANALYZE TABLE, and REPAIR TABLE — but I never explained in depth what the different commands do, and what the differences between them are. That is what I thought I would do with this post, focusing on InnoDB and MyISAM, and the differences in how they treat those commands. I will also look at different cases and see which one is right for in each case.

(more…)

Falcon Transactional Characteristics

By Keith Murphy July 14th, 2008 at 9:56 pm
Posted in MySQL
Tags:

It’s time to continue our series on the transactional storage engines for MySQL. Some might question why I even include Falcon because it is very much beta at this time. MySQL, however, has made quite an investment into Falcon, and while it is currently beta, the code is improving and it looks like that it will be production-worthy when MySQL server 6.0 hits GA.

If this is the case, it is important to begin to understand what Falcon was designed for and how it differs from other transactional engines such as InnoDB. I am going to concentrate quite a bit on the Falcon/InnoDB comparison as that is what everyone wants to talk about. This is despite my having heard MySQL employees repeatedly make statements to the effect of, “Falcon is not going to replace InnoDB,” or “Falcon is not competing with InnoDB.” Well, take that with a grain of salt. It certainly seems to me that they are competing for the same spot.

Warning

As I said, Falcon is beta. First off, don’t even try to use it in production. Using it in production means you will also be using MySQL Server 6.0, which itself is considered alpha. Your data will explode, be corrupted, or eaten by jackals. It won’t be pretty. It will cause great pain.

In addition, the features of Falcon are still changing. What I say here might or might not be accurate in the future.

End of Warning

So, why was Falcon even created?

(more…)

InnoDB Transactional Characteristics

InnoDB is a storage engine that uses MVCC (described shortly) to provide ACID-compliant transactional data storage using row-level locking.  MVCC stands for Multi-Version Concurrency Control.  It is how InnoDB allows multiple transactions to look at a data set of one or more tables and have a consistent view of the data. MVCC keeps a virtual snapshot of the dataset for each transaction.  An example will make this clear.

Let’s assume you have two transactions (and only two transactions) running on a system. If transaction A starts at 10:45:56 and ends at 10:45:89, it gets a consistent view of the dataset during the time that the transaction runs.  If transaction B starts at 10:45:65, it would see exactly the same view of the dataset that transaction A saw when it began the transaction.  If transaction B started at 10:45:95, it would see the modified dataset after transaction A made modifications. During the duration of each transaction, the dataset that each sees does not change, except for the modifications the transaction itself makes.  Consider that a typical production database server is running hundreds of queries a second, and you realize that the job of MVCC/the InnoDB storage engine gets very complicated maintaining all these views of the data.

MySQL server storage engines use three different locking options: table-level locking, page-level locking, and row-level locking.  With table-level locking, if a query accesses the table it will lock the entire table and not allow access to the table from other queries.  The benefit of this is that it entirely eliminates deadlocking issues. The disadvantage is that, as mentioned, no other queries have access to the table while it is locked.  If you had a table with 16,000,000 rows and needed to modify one row, the entire table is inaccessible by other queries. The MyISAM and memory storage engine use table-level locking.

Page-level locking is locking of a group of rows instead of a the entire table. The number of rows actually locked will vary based on a number of factors. Going back to our example of a 16,000,000-row table, lets assume a page-level lock is used.  If a page consists of 1,000 rows (this would vary depending on the size of the rows and the actual amount of memory allocated to a page), a lock would lock only a thousand rows.  Any of the other 15,999,000 rows could be used by other queries without interference. The BDB storage engine uses page-level locking.

Row-level locking, as the name suggests, acquires a lock on as small an amount as a single row from a table. This will block the minimal amount of table content and allows for the most concurrency on a table without problems. InnoDB and Falcon both use row-level locking.

While the InnoDB configuration options are not strictly related to the transactional characteristics (other than innodb_flush_log_at_trx_commit), I thought it would be useful to have them here for reference.  These are the most common or most important configuration parameters:

(more…)

Differences Between innodb_data_file_path and innodb_file_per_table

By Keith Murphy June 20th, 2008 at 1:35 pm
Posted in MySQL
Tags:

Recently, a customer wondered if they should start using the innodb_file_per_table option, or if they should continue to use the large InnoDB tablespace files created by the innodb_data_file_path option in the my.cnf option file.

Many people still use the older innodb_data_file_path option because it is the default for MySQL server. So, what are the benefits of using innodb_file_per_table instead?

The innodb_file_per_table makes for easier-to-manage files. With this option each InnoDB table has its own data and index file under the database directory. As an example, if you had table foo located in database xyz the InnoDB data file for table foo would be /var/lib/mysql/data/xyz/foo.idb. Each table would have its own idb table in the appropriate database directory. This is in contrast to using the innodb_data_file_path option with (typically) one large file in the root of your data directory. For example, it might be /var/lib/mysql/data/ibdata1.idb. All table data and indexes would be stored in this one file, and it can be very large  and unwieldy. I don’t recall the largest ibdata file I have seen, but what do you do if you have a 100 gig InnoDB tablespace file? It can, and does, happen. The file contains what amounts to all the data of all your databases on the server.

(more…)

InnoDB’s Adaptive Hash

By Sheeri Cabral December 12th, 2007 at 1:04 pm
Posted in Group Blog PostsMySQL
Tags:

MySQL’s manual page for InnoDB’s adaptive hash states:

If a table fits almost entirely in main memory, the fastest way to perform queries on it is to use hash indexes. InnoDB has a mechanism that monitors index searches made to the indexes defined for a table. If InnoDB notices that queries could benefit building a hash index, it does so automatically.

Note that the hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on the pattern of searches that InnoDB observes for the B-tree index. A hash index can be partial: It is not required that the whole B-tree index is cached in the buffer pool. InnoDB builds hash indexes on demand for those pages of the index that are often accessed.

In a sense, InnoDB tailors itself through the adaptive hash index mechanism to ample main memory, coming closer to the architecture of main-memory databases.

This sounds like a good idea. However, with memory capacities increasing, InnoDB can create more and more hash indexes. While these hash indexes are fast, they also require the same maintenance as regular indexes. This means that the hash index needs to be written to, and therefore locked, whenever data is changed.

In addition, there’s a bug in InnoDB, detailed in Bug Report 20358 and Bug Report 29560. There’s a workaround in MySQL 5.0.52 (released Nov. 30 2007) and fixed in MySQL 5.0.54 (not yet released).

The first referenced bug report contains the following description:

[The read lock within row_search_for_mysql()] is a SELECT query done as a consistent read, and the read view has already been allocated. . . . [description of algorithm elided] Our algorithm is NOT prepared to inserts interleaved with the SELECT, and if we try that, we can deadlock on the adaptive hash index semaphore!

With multi-core machines (and innodb_thread_concurrency) rapidly on the rise, more and more of these adaptive hash index semaphore deadlocks are seen. You can see this bug implemented if you have many transactions that “holds adaptive hash latch” — see the “Transactions” section of the innodb_lock_monitor or in SHOW ENGINE INNODB STATUS

I guess there finally is a benefit to having MySQL Enterprise — the ability to disable InnoDB’s Adaptive Hash right now, with MySQL 5.0.52. Recall that even numbers are Enterprise, and odd ones are Community. The next edition of the Community server does not have a release date, but as the previous revision was October 19th, it could be a wait as long as April for a source build — with a build promised “once per quarter”, that could be March 31st. For a binary build, it may be longer — we were promised a binary build at least twice a year, though I believe MySQL’s intentions are approximately once every 6 months. With luck, by the MySQL Conference & Expo mere community users will be able to turn off adaptive hashing.

I am highly annoyed that I cannot find a reference to this bugfix (or is it a feature addition, the addition of a new flag?) in the Release Notes:

http://dev.mysql.com/doc/refman/5.0/en/releasenotes-es-5-0-52.html

However, I do know what version it is in because 1) MySQL Support said so and 2) The bug report says so (see 18 Oct. 23:27). On that note, there is no reference to the new flag in the 5.1.23 release notes either. At least that is consistent! Although the http://lists.mysql.com/commits/34798

As a postscript to this — if you are a developer, whether or not it’s for Oracle (this was an InnoDB bug, after all…) if you put a “smart” feature into your software, make sure you have the means to easily disable it. What is smart today is not always smart tomorrow. In this particular case, the adaptive hashing is not causing the issue, but the locking that the index requires does cause the issue. However, having the flag earlier on would have saved a lot of grief, at least while the developers could figure out what was going on.