What Applications Are Good For MySQL Cluster?
Feb 13, 2008 / By Sheeri Cabral
Someone asked me what applications were good/bad for MySQL Cluster. As I’ve now actually had experience with a Cluster setup and a real-life application of it, and dug through the manual, I present a few characteristics of applications that will work with Cluster, and why they are so (so that if you have an application that meets some of the characteristics but not all of them, you can decide whether it’s worth it to use Cluster or not).
Firstly, I’ll state this — there’s actually a very limited application to MySQL cluster. I haven’t assessed the disk-based cluster, only the memory-based one, so I don’t know what really changes with disk-based. But after you see this list, you certainly will want to re-think your use of disk-based cluster if a lot of the inner workings don’t change.
The factors are listed below, but the “ruler” I keep in my mind is the fact that MySQL Cluster was developed for telecom applications. The basic characteristic is a lot of writes, small data in amounts that can fit into memory, and the data being transient in nature. Something like a “session” table is a great application.
So here are the characteristics of a good application for MySQL Cluster, with explanations of why:
The data and indexes must fit into memory. Each cluster has node groups. Each node group holds a fragment of the data, and each node group has a number of replicas. So a cluster with 2 node groups with 4 replicas each has the data split in half, with 4 machines redundantly storing one half the data, and another 4 redundantly storing the other half. Therefore, the data doesn’t have to entirely fit into RAM, but it does have to fit in (RAM of 1 node)*(# node groups).
The application has a lot of writes, more than a dual-master setup can handle. Dual-master (also called master-master) replication is pretty fault-tolerant, but it doesn’t scale — circular replication is not fault tolerant at all, and one slave can only have one master (though one master can have many slaves) so you can’t make a mesh. With cluster, the data and sql nodes are separate, and each data node can be (and is) written to, so you can get that mesh network.
Few uses of range queries. In-memory tables use hash tables for storing indexes and data, so exact matches are important, for both writes and reads. Regular on-disk tables use B-trees for both data and indexes, so finding data in a range is fast and easy. Using hashes, as in-memory tables do, is very fast for an exact lookup, but not as fast as folks are used to for range queries. This also means that table joins are often poor — basically if it’s joining more than a few rows, it does not necessarily perform well. Finding “the next row in the index/table” is not faster than finding any given row in the index/table. Data and indexes aren’t clustered together by anything intuitive.
Sorting data is fast because it’s all done in memory, but if you’re retrieving a lot of data for a sort, then you run into the problems with retrieving the data as I stated in the previous paragraph. So I say that lots of sorts are bad, even though the reason they’re bad is because if you’re doing a lot of sorting, you’re doing a lot of information lookups. It’s probably not fair of me to say that, because it’s really the lookups that are bad, but I think most people do not make the correlation between lots of sorts and lots of lookups at first glance.
To be fair, in this case “bad” means “worse than an on-disk table that is cached in memory” because on-disk tables are ordered on-disk and in-memory. So it’s not that it will perform badly, but it won’t perform tons better than an on-disk table (particularly if the first point about memory still holds true).
The data must have a primary key. If it doesn’t the NDB engine will make one for it with an autoincrement value.
The data is stored in fixed-length columns. So VARCHAR values will act like CHAR values — meaning each VARCHAR(60) is stored in 60 characters, even if the string is ‘hi’. For the purpose of this discussion I’m going to assume we’re using single-byte characters, so that’s 60 bytes. For this reason it doesn’t make sense to store varying data unless you’re OK with wasting the overhead — keeping in mind the size of data pages matters, MySQL currently uses 32Kb data pages. There’s a hard limit of 8052 bytes per row (I have no idea how they got that), with the knowledge that there’s a 16 byte overhead per row. An index record has the same limitations, though the size of a hash index is 25 bytes of overhead in addition to the size of the key. If the key is larger than 32 *bytes*, there’s an additional 8 bytes of overhead.
That all doesn’t sound like a lot, but when you’re adding an additional 8 bytes for 10 million records, it’s a big deal — an index that’s 30 bytes long uses 524Mb of space for the index plus overhead, and an index that’s 35 bytes long uses 648Mb. And that’s just the size of the index itself; folks are used to having lots of tables and lots of indexes. So even if your data seems small, all that overhead might increase the size half again, or double the size, depending on the columns used (remember VARCHAR acts like CHAR) and how many indexes are used.
There’s also a hard limit on the number of “metadata objects” — I don’t know if there’s an easy way to calculate this, but it includes at least # of logical databases (mmod only has 1), # of tables, # of columns, # of indexes, # of views, # of triggers, # of stored procedures, etc. The limit is 20,320. This means cluster isn’t a catch-all solution by any means, and is really good for smaller applications.
When data is deleted in cluster, the memory is freed up for that table only. To free up the memory for any table to use, a rolling node restart is needed. This works if the tables have a proportional amount of data usage, but if tables have different usages during different time periods and DELETEs are used, then Cluster is not what you want.
Applications that use a lot of small transactions work well. The NDB storage engine used in MySQL Cluster is transaction-aware, but as larger transactions use more memory, there’s more overhead and more risk involved with larger transactions on Cluster than on a regular on-disk table. It’s possible to burn yourself because there are parameters for many transaction-related operations, including how many transactions can be running at the same time — this is more restrictive than something like max_connections as a mysqld parameter, because connections aren’t necessarily running a query.
Disk writes are done in a few ways (though if you like playing with fire, you can specify that writes are never done). One way is writing the REDO log. Entries in the REDO log aren’t deleted until there have been three checkpoints since the record was inserted. In high-write clusters, checkpoints usually happen without any time in between; once a checkpoint is finished the next checkpoint will start. The default (which we haven’t changed) is to wait until 4Mb of data has been written, and then checkpoint again. There are global checkpoints as well, which default to happening every 2 seconds. I could go into all the ways that checkpoints are configurable but there are actually a lot of different knobs that can be tweaked. Data pages are also written at intervals that depend on many different knobs.
Suffice it to say that I wouldn’t use MySQL Cluster on a system that needs every single piece of information logged to the nth degree, like a financial app, and that if there’s a catastrophe, some data loss is acceptable. For session-type tables, that’s appropriate, and perhaps even expected.
Backups are online and non-blocking, which is a nice feature, but not a reason to use Cluster all by itself.
Which data nodes to send data to are done by the Cluster, via a hash. As far as I know there’s no way to choose the algorithm, so you can’t partition your data as you might want to in a data warehouse or other scenario. I think that’s why MySQL chooses to use the term “fragment” when they talk about data nodes instead of “partition”.
There’s probably more, but that’s as far as I’ve gotten. Anyone have any other characteristics they’d like to add? Or clarifications, or spot any errors in what I’ve written?