Falcon Transactional Characteristics
Jul 14, 2008 / By Keith Murphy
It’s time to continue our series on the transactional storage engines for MySQL. Some might question why I even include Falcon because it is very much beta at this time. MySQL, however, has made quite an investment into Falcon, and while it is currently beta, the code is improving and it looks like that it will be production-worthy when MySQL server 6.0 hits GA.
If this is the case, it is important to begin to understand what Falcon was designed for and how it differs from other transactional engines such as InnoDB. I am going to concentrate quite a bit on the Falcon/InnoDB comparison as that is what everyone wants to talk about. This is despite my having heard MySQL employees repeatedly make statements to the effect of, “Falcon is not going to replace InnoDB,” or “Falcon is not competing with InnoDB.” Well, take that with a grain of salt. It certainly seems to me that they are competing for the same spot.
As I said, Falcon is beta. First off, don’t even try to use it in production. Using it in production means you will also be using MySQL Server 6.0, which itself is considered alpha. Your data will explode, be corrupted, or eaten by jackals. It won’t be pretty. It will cause great pain.
In addition, the features of Falcon are still changing. What I say here might or might not be accurate in the future.
End of Warning
So, why was Falcon even created?
Falcon has been specially developed for systems that are able to support large memory architectures and multi-threaded or multi-core CPU environments. While it performs optimally on 64-bit architectures it can also be deployed on 32-bit platforms. A CPU count of up to eight cores and eight gigs is the current “sweet spot”.
Whenever I hear MySQL engineers talk about Falcon, they emphasize this idea that Falcon was created from the ground up for multiple core CPUs and large amounts of RAM. This is in contrast to InnoDB, which was created at a time when servers had very small amounts of RAM and CPU power compared to modern servers.
And there are definitely some benefits to a “new start” as opposed to building on old technology. As an example, consider what Apple did with the Macintosh operating system when they moved to a completely new technology platform — they created OS X. This allowed Apple to create an operating system which performs better on today’s modern hardware.
In InnoDB’s defense (if it needs any), it has aged fairly gracefully, in my opinion. Scaling beyond four cores, however, has been a sore point for some time. Recent work by a Google engineer group (headed by Mark Callaghan) has led to patches being accepted into the InnoDB codebase that allow for almost linear scaling of InnoDB on up to eight cores. In addition, there is work being done on the memory issues to allow InnoDB to use up to 128 GB of RAM.
If you are looking at Falcon and InnoDB from a CPU and memory usage/optimization point of view, they are currently fairly even in comparison. I think competition brings out the best, so this is a good thing all around.
Here are the key features of Falcon (from MySQL.com’s Falcon documentation):
- True Multi-Version Concurrency Control (MVCC) enables records and tables to be updated without the overhead associated with row-level locking mechanisms. The MVCC implementation virtually eliminates the need to lock tables or rows during the update process.
- Flexible locking, including flexible locking levels and smart deadlock detection keep data protected and transactions and operations flowing at full speed.
- Optimized for modern CPUs and environments to support multiple threads, allowing multiple transactions and fast transaction handling.
- Transaction-safe (fully ACID-compliant), and able to handle multiple concurrent transactions.
- Serial Log provides high performance and recovery capabilities without sacrificing performance.
- Advanced B-Tree indexes.
- Data compression stores the information on disk in a compressed format, compressing and decompressing data on the fly. The result is smaller and more efficient physical data sizes.
- Intelligent disk management automatically manages data files and extensions. Space within log and data files is automatically reclaimed and reused.
- Data and index caching provides quick access to data without the requirement to load index data from disk.
- Implicit savepoints ensure data integrity during transactions
Some of above features are similar to InnoDB’s features. For example, InnoDB is fully transaction-safe and can handle multiple concurrent transactions. And I think it could be easily argued that InnoDB, like Falcon,Â is now optimized for modern CPUs and supports multiple threading. As I noted earlier, however, this is something that has changed rather recently.
There are some definite differences in features also.Â For example, while InnoDB and Falcon both use B-tree indexes the InnoDB implementation always reads in physical disk order when using the B-tree index.Â This minimizes I/O response time. Falcon does not do this. Another difference is that Falcon does not support Read Uncommitted isolation level. Read Uncommitted (sometimes referred to as “dirty” reads) is not needed for most situations. To me, this is a definite plus for Falcon. I don’t think dirty reads should even be an option for a database.
One of the big center points of Falcon is that it implements MVCC differently than InnoDB. InnoDB uses row-level locking for transactions while Falcon “almost never” locks either tables or rows. The idea is that this should generate less overhead (row-level locking takes memory and CPU resources).
Samer El Sahn had this to say about MVCC in Falcon:
Falcon is multi-version in memory and single-version on disk, True Multi Version Concurrency Control (MVCC) enables records and tables to be updated without the overhead associated with row-level locking mechanisms. The MVCC implementation virtually eliminates the need to lock tables or rows during the update process, also data and index caching provides quick access to data without the requirement to load index data from disk.
You can see how the larger server RAM amounts would be good for this setup. The amount of RAM and the CPU core count will only continue to rise so in my opinion this is a good path for Falcon to be going “down”.
Currently (although I understand this will change) Falcon does not support statement-based replication. Starting in version MySQL 5.1, both statement-based and row-based replication are supported for binary logging.
Statement-based replication is based on sending all the queries run on a maser server to the slave server and re-executing them on the slave. For row-based replication, only the results of the queries are going to be sent from master to slave.
Falcon supports only row-based replication. This is to guarantee consistent results between binary log and data files. It’s going to be a major point of contention for many administrators. We love our statement-based replication. However, there are real benefits for row-based replication. Ever watch your slave get 1,000 seconds behind the master while it is working on a single query that ends up returning 20 updated rows? With row-based replication, that same result would happen in less than a second as it essentially sends the changed rows and not the query itself.
A unique feature of Falcon is the on-the-fly data compression/decompression when reading and writing to the database. Â Essentially, when you update your table, it compresses the data as it stores it, with no intervention from you. As you read from the table, an automatic decompression takes place. Similarly to a compressed MyISAM table, but you can both read and write to the table. The storage space savings for large databases will be nice. From what I understand there are performance benefits to this also.
I hope this has been a useful overview for what is coming from the Falcon storage engine. I didn’t go into the online backup capabilities of Falcon as those are really more of a MySQL Server 6.0 feature than a Falcon one. Even so, there is work being done to integrate Falcon with online backup so that these backups will work optimally.