What Applications Are Good For MySQL Cluster?

Feb 13, 2008 / By Sheeri Cabral

Tags:

Someone asked me what applications were good/bad for MySQL Cluster. As I’ve now actually had experience with a Cluster setup and a real-life application of it, and dug through the manual, I present a few characteristics of applications that will work with Cluster, and why they are so (so that if you have an application that meets some of the characteristics but not all of them, you can decide whether it’s worth it to use Cluster or not).

Firstly, I’ll state this — there’s actually a very limited application to MySQL cluster. I haven’t assessed the disk-based cluster, only the memory-based one, so I don’t know what really changes with disk-based. But after you see this list, you certainly will want to re-think your use of disk-based cluster if a lot of the inner workings don’t change.

The factors are listed below, but the “ruler” I keep in my mind is the fact that MySQL Cluster was developed for telecom applications. The basic characteristic is a lot of writes, small data in amounts that can fit into memory, and the data being transient in nature. Something like a “session” table is a great application.

So here are the characteristics of a good application for MySQL Cluster, with explanations of why:

The data and indexes must fit into memory. Each cluster has node groups. Each node group holds a fragment of the data, and each node group has a number of replicas. So a cluster with 2 node groups with 4 replicas each has the data split in half, with 4 machines redundantly storing one half the data, and another 4 redundantly storing the other half. Therefore, the data doesn’t have to entirely fit into RAM, but it does have to fit in (RAM of 1 node)*(# node groups).

The application has a lot of writes, more than a dual-master setup can handle. Dual-master (also called master-master) replication is pretty fault-tolerant, but it doesn’t scale — circular replication is not fault tolerant at all, and one slave can only have one master (though one master can have many slaves) so you can’t make a mesh. With cluster, the data and sql nodes are separate, and each data node can be (and is) written to, so you can get that mesh network.

Few uses of range queries. In-memory tables use hash tables for storing indexes and data, so exact matches are important, for both writes and reads. Regular on-disk tables use B-trees for both data and indexes, so finding data in a range is fast and easy. Using hashes, as in-memory tables do, is very fast for an exact lookup, but not as fast as folks are used to for range queries. This also means that table joins are often poor — basically if it’s joining more than a few rows, it does not necessarily perform well. Finding “the next row in the index/table” is not faster than finding any given row in the index/table. Data and indexes aren’t clustered together by anything intuitive.

Sorting data is fast because it’s all done in memory, but if you’re retrieving a lot of data for a sort, then you run into the problems with retrieving the data as I stated in the previous paragraph. So I say that lots of sorts are bad, even though the reason they’re bad is because if you’re doing a lot of sorting, you’re doing a lot of information lookups. It’s probably not fair of me to say that, because it’s really the lookups that are bad, but I think most people do not make the correlation between lots of sorts and lots of lookups at first glance.

To be fair, in this case “bad” means “worse than an on-disk table that is cached in memory” because on-disk tables are ordered on-disk and in-memory. So it’s not that it will perform badly, but it won’t perform tons better than an on-disk table (particularly if the first point about memory still holds true).

The data must have a primary key. If it doesn’t the NDB engine will make one for it with an autoincrement value.

The data is stored in fixed-length columns. So VARCHAR values will act like CHAR values — meaning each VARCHAR(60) is stored in 60 characters, even if the string is ‘hi’. For the purpose of this discussion I’m going to assume we’re using single-byte characters, so that’s 60 bytes. For this reason it doesn’t make sense to store varying data unless you’re OK with wasting the overhead — keeping in mind the size of data pages matters, MySQL currently uses 32Kb data pages. There’s a hard limit of 8052 bytes per row (I have no idea how they got that), with the knowledge that there’s a 16 byte overhead per row. An index record has the same limitations, though the size of a hash index is 25 bytes of overhead in addition to the size of the key. If the key is larger than 32 *bytes*, there’s an additional 8 bytes of overhead.

That all doesn’t sound like a lot, but when you’re adding an additional 8 bytes for 10 million records, it’s a big deal — an index that’s 30 bytes long uses 524Mb of space for the index plus overhead, and an index that’s 35 bytes long uses 648Mb. And that’s just the size of the index itself; folks are used to having lots of tables and lots of indexes. So even if your data seems small, all that overhead might increase the size half again, or double the size, depending on the columns used (remember VARCHAR acts like CHAR) and how many indexes are used.

There’s also a hard limit on the number of “metadata objects” — I don’t know if there’s an easy way to calculate this, but it includes at least # of logical databases (mmod only has 1), # of tables, # of columns, # of indexes, # of views, # of triggers, # of stored procedures, etc. The limit is 20,320. This means cluster isn’t a catch-all solution by any means, and is really good for smaller applications.

When data is deleted in cluster, the memory is freed up for that table only. To free up the memory for any table to use, a rolling node restart is needed. This works if the tables have a proportional amount of data usage, but if tables have different usages during different time periods and DELETEs are used, then Cluster is not what you want.

Applications that use a lot of small transactions work well. The NDB storage engine used in MySQL Cluster is transaction-aware, but as larger transactions use more memory, there’s more overhead and more risk involved with larger transactions on Cluster than on a regular on-disk table. It’s possible to burn yourself because there are parameters for many transaction-related operations, including how many transactions can be running at the same time — this is more restrictive than something like max_connections as a mysqld parameter, because connections aren’t necessarily running a query.

Disk writes are done in a few ways (though if you like playing with fire, you can specify that writes are never done). One way is writing the REDO log. Entries in the REDO log aren’t deleted until there have been three checkpoints since the record was inserted. In high-write clusters, checkpoints usually happen without any time in between; once a checkpoint is finished the next checkpoint will start. The default (which we haven’t changed) is to wait until 4Mb of data has been written, and then checkpoint again. There are global checkpoints as well, which default to happening every 2 seconds. I could go into all the ways that checkpoints are configurable but there are actually a lot of different knobs that can be tweaked. Data pages are also written at intervals that depend on many different knobs.

Suffice it to say that I wouldn’t use MySQL Cluster on a system that needs every single piece of information logged to the nth degree, like a financial app, and that if there’s a catastrophe, some data loss is acceptable. For session-type tables, that’s appropriate, and perhaps even expected.

Backups are online and non-blocking, which is a nice feature, but not a reason to use Cluster all by itself.

Which data nodes to send data to are done by the Cluster, via a hash. As far as I know there’s no way to choose the algorithm, so you can’t partition your data as you might want to in a data warehouse or other scenario. I think that’s why MySQL chooses to use the term “fragment” when they talk about data nodes instead of “partition”.

There’s probably more, but that’s as far as I’ve gotten. Anyone have any other characteristics they’d like to add? Or clarifications, or spot any errors in what I’ve written?

4 Responses to “What Applications Are Good For MySQL Cluster?”

  • Hi Sheeri,

    nice write up! A few comments though:

    “In-memory tables use hash tables for storing indexes and data, so exact matches are important, for both writes and reads.”

    Well, it’s the UNIQUE indexes (and PRIMARY KEY) that use a HASH index. You do have tree-like indexes (T-TREE is the algorithm, it’s an ordered index like BTREE though) in Cluster for the non-unique indexes. To complicate things, if you create a UNIQUE index (either explicit or implicit by creating a UNIQUE or PRIMARY KEY constraint), in addition to the HASH index, Cluster will automatically create the T-TREE index as well! You can use the USING HASH clause to *prevent* Cluster from automatically creating the T-TREE index.

    These T-TREE indexes are used for all range scans, and are quite good for that. The reason that JOIN performance is poor is not so much (or at least not only) caused by the indexes but because of the implementation of the JOIN – It’s because the MySQL server performs the join and does not batch the requests for the records from the joined table. IOW, it does not batch requests, and instead does a roundtrip for each record.

    I believe there is currently a patch in the Carrier Grade edition that uses batched access for joins, as well as a number of other improvements that improve JOIN performance quite a lot (5-10 times for some scenarios)

    “I wouldn’t use MySQL Cluster on a system that needs every single piece of information logged to the nth degree, like a financial app, and that if there’s a catastrophe, some data loss is acceptable. ”

    Heh – actually this does not make sense to me ;) In many if not most scenarios, Cluster is set up to have multiple nodes per node group. As long as at least one node per group is alive, and all node groups are alive, the cluster is alive and the data is available. So, it is in fact more resilient to catastrophe than InnoDB (which is basically lost whenever the disk is broken): if you have say, 4 nodes per node group you need no less than 4 broken machines in the same node group before your cluster is down – and even then you might have lost 2 seconds of data at the most (assuming a default global checkpointing interval)

    It is of course possible to build up the node groups in a manner that there is still a single point of failure. For example, if all nodes in one node group are fed of one electricity socket, well, then there is just one socket that can break the cluster. Of course, you should build it in a way that that can’t happen ;)

    “As far as I know there’s no way to choose the algorithm, so you can’t partition your data as you might want to in a data warehouse or other scenario. I think that’s why MySQL chooses to use the term “fragment” when they talk about data nodes instead of “partition”.”

    It’s true: in 5.0 you cannot choose partitioning. In 5.1, you can use the PARTITIONING clauses and at least control the amount of partitioning, but I believe HASH and KEY partitioning are still required.

    If you have the change, pick up a copy of the MySQL 5.1 Cluster Certification Study Guide. It’s got a lot of info that may be of help working with MySQL Cluster. It also covers disk-based data and other 5.1 features.

    kind regards,

    Roland Bouman

  • Jonas Oreland says:

    Further corrections:
    1) The data is stored in fixed-length columns. So VARCHAR values will act like CHAR values

    Fixed in 5.1

    2) When data is deleted in cluster, the memory is freed up for that table only. To free up the memory for any table to use, a rolling node restart is needed.

    This is fixed in CGE-6.2
    and in CGE-6.3 we support online non-blocking optimize table
    (though currently only for varsize part of data)

    /jonas

  • Konstantin Rozinov says:

    Hi Sherri,

    I’m working with a few friends to try to launch a social network type of site, with many of the typical social network site features (profiles, photos, comments, videos, etc). We are using LAMP environment. I’ve been tasked with researching MySQL Cluster, MySQL Replication, and how various other large sites cope with large amounts of traffic and how they solve their scalability and HA issues. We don’t want to become popular one night and be overwhelmed by the traffic.

    What I’ve found is this:
    - Replication is widely used by many of the largest sites
    - memcached is widely used by many of the largest sites
    - MySQL Cluster is not widely used at all (not sure why – seems like a great product)

    As far as I can tell, Replication has some problems:
    - the delay in syncing current data to the slaves.
    - replication is ideal for read-intensive applications.
    - need to modify web application to read from slaves and write to master.
    - MOST IMPORTANTLY: the single point of failure and bottleneck point with 1 master server.

    From what I read and understand, MySQL Cluster is ideal for both read and write intensive applications and is built for high availability and scalability. Seems like a great solution. But why is no one using it for web applications?

    It seems everyone is recommending replication, but I have concerns about it:

    1. Initially, I expect that there will be as many writes as reads as more and more users create profiles, post photos, comments, etc. Considering that replication is great for read-intensive applications, would replication be of any help here?

    2. The SPOF with 1 Master MySQL server really scares me. I’ve read about Master-Master Replication but again, the bottleneck would be the writes. Am I wrong?

    3. Even if I partition my data across different master databases, if one of them fails then part of the site (and potentially the entire site) might go offline. Or am I wrong?

    4. How do these big sites use replication without running into write performance issues?

    Thanks for any help and suggestions!

    Konstantin

  • Mike says:

    Konstantin,

    Did you ever receive a response to your questions outside this forum? Have you decided yet what solution you are going with? I have many of the same questions you had as I research all the MySQL based solutions available. I am leaving towards MySQL cluster, because I think the database I am dealing with may only be 10 or 15 GB max and needs to have virtually zero downtime.

    Mike

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>