<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: What Applications Are Good For MySQL Cluster?</title>
	<link>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster</link>
	<description>News and views from Pythian DBAs</description>
	<pubDate>Sat, 22 Nov 2008 12:49:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.2</generator>
		<item>
		<title>By: Mike</title>
		<link>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-284247</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Tue, 30 Sep 2008 19:02:36 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-284247</guid>
		<description>Konstantin,

Did you ever receive a response to your questions outside this forum?  Have you decided yet what solution you are going with?  I have many of the same questions you had as I research all the MySQL based solutions available.  I am leaving towards MySQL cluster, because I think the database I am dealing with may only be 10 or 15 GB max and needs to have virtually zero downtime.   

Mike</description>
		<content:encoded><![CDATA[<p>Konstantin,</p>
<p>Did you ever receive a response to your questions outside this forum?  Have you decided yet what solution you are going with?  I have many of the same questions you had as I research all the MySQL based solutions available.  I am leaving towards MySQL cluster, because I think the database I am dealing with may only be 10 or 15 GB max and needs to have virtually zero downtime.   </p>
<p>Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Konstantin Rozinov</title>
		<link>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-242530</link>
		<dc:creator>Konstantin Rozinov</dc:creator>
		<pubDate>Thu, 24 Jul 2008 06:05:50 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-242530</guid>
		<description>Hi Sherri,

I'm working with a few friends to try to launch a social network type of site, with many of the typical social network site features (profiles, photos, comments, videos, etc).  We are using LAMP environment.  I've been tasked with researching MySQL Cluster, MySQL Replication, and how various other large sites cope with large amounts of traffic and how they solve their scalability and HA issues.  We don't want to become popular one night and be overwhelmed by the traffic.

What I've found is this:
- Replication is widely used by many of the largest sites
- memcached is widely used by many of the largest sites
- MySQL Cluster is not widely used at all (not sure why - seems like a great product)

As far as I can tell, Replication has some problems:
- the delay in syncing current data to the slaves.
- replication is ideal for read-intensive applications.
- need to modify web application to read from slaves and write to master.
- MOST IMPORTANTLY: the single point of failure and bottleneck point with 1 master server.

From what I read and understand, MySQL Cluster is ideal for both read and write intensive applications and is built for high availability and scalability.  Seems like a great solution.  But why is no one using it for web applications?

It seems everyone is recommending replication, but I have concerns about it:

1.  Initially, I expect that there will be as many writes as reads as more and more users create profiles, post photos, comments, etc.  Considering that replication is great for read-intensive applications, would replication be of any help here?

2.  The SPOF with 1 Master MySQL server really scares me.  I've read about Master-Master Replication but again, the bottleneck would be the writes.  Am I wrong?

3.  Even if I partition my data across different master databases, if one of them fails then part of the site (and potentially the entire site) might go offline.  Or am I wrong?

4.  How do these big sites use replication without running into write performance issues?


Thanks for any help and suggestions!

Konstantin</description>
		<content:encoded><![CDATA[<p>Hi Sherri,</p>
<p>I&#8217;m working with a few friends to try to launch a social network type of site, with many of the typical social network site features (profiles, photos, comments, videos, etc).  We are using LAMP environment.  I&#8217;ve been tasked with researching MySQL Cluster, MySQL Replication, and how various other large sites cope with large amounts of traffic and how they solve their scalability and HA issues.  We don&#8217;t want to become popular one night and be overwhelmed by the traffic.</p>
<p>What I&#8217;ve found is this:<br />
- Replication is widely used by many of the largest sites<br />
- memcached is widely used by many of the largest sites<br />
- MySQL Cluster is not widely used at all (not sure why - seems like a great product)</p>
<p>As far as I can tell, Replication has some problems:<br />
- the delay in syncing current data to the slaves.<br />
- replication is ideal for read-intensive applications.<br />
- need to modify web application to read from slaves and write to master.<br />
- MOST IMPORTANTLY: the single point of failure and bottleneck point with 1 master server.</p>
<p>From what I read and understand, MySQL Cluster is ideal for both read and write intensive applications and is built for high availability and scalability.  Seems like a great solution.  But why is no one using it for web applications?</p>
<p>It seems everyone is recommending replication, but I have concerns about it:</p>
<p>1.  Initially, I expect that there will be as many writes as reads as more and more users create profiles, post photos, comments, etc.  Considering that replication is great for read-intensive applications, would replication be of any help here?</p>
<p>2.  The SPOF with 1 Master MySQL server really scares me.  I&#8217;ve read about Master-Master Replication but again, the bottleneck would be the writes.  Am I wrong?</p>
<p>3.  Even if I partition my data across different master databases, if one of them fails then part of the site (and potentially the entire site) might go offline.  Or am I wrong?</p>
<p>4.  How do these big sites use replication without running into write performance issues?</p>
<p>Thanks for any help and suggestions!</p>
<p>Konstantin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonas Oreland</title>
		<link>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-181376</link>
		<dc:creator>Jonas Oreland</dc:creator>
		<pubDate>Fri, 18 Apr 2008 11:31:49 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-181376</guid>
		<description>Further corrections:
1) The data is stored in fixed-length columns. So VARCHAR values will act like CHAR values

Fixed in 5.1

2) When data is deleted in cluster, the memory is freed up for that table only. To free up the memory for any table to use, a rolling node restart is needed.

This is fixed in CGE-6.2
and in CGE-6.3 we support online non-blocking optimize table
(though currently only for varsize part of data)

/jonas</description>
		<content:encoded><![CDATA[<p>Further corrections:<br />
1) The data is stored in fixed-length columns. So VARCHAR values will act like CHAR values</p>
<p>Fixed in 5.1</p>
<p>2) When data is deleted in cluster, the memory is freed up for that table only. To free up the memory for any table to use, a rolling node restart is needed.</p>
<p>This is fixed in CGE-6.2<br />
and in CGE-6.3 we support online non-blocking optimize table<br />
(though currently only for varsize part of data)</p>
<p>/jonas</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roland Bouman</title>
		<link>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-161424</link>
		<dc:creator>Roland Bouman</dc:creator>
		<pubDate>Wed, 13 Feb 2008 16:50:31 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/826/what-applications-are-good-for-mysql-cluster#comment-161424</guid>
		<description>Hi Sheeri,

nice write up! A few comments though: 

"In-memory tables use hash tables for storing indexes and data, so exact matches are important, for both writes and reads."

Well, it's the UNIQUE indexes (and PRIMARY KEY) that use a HASH index. You do have tree-like indexes (T-TREE is the algorithm, it's an ordered index like BTREE though) in Cluster for the non-unique indexes. To complicate things, if you create a UNIQUE index (either explicit or implicit by creating a UNIQUE or PRIMARY KEY constraint), in addition to the HASH index, Cluster will automatically create the T-TREE index as well! You can use the USING HASH clause to *prevent* Cluster from automatically creating the T-TREE index.

These T-TREE indexes are used for all range scans, and are quite good for that. The reason that JOIN performance is poor is not so much (or at least not only) caused by the indexes but because of the implementation of the JOIN - It's because the MySQL server performs the join and does not batch the requests for the records from the joined table. IOW, it does not batch requests, and instead does a roundtrip for each record.

I believe there is currently a patch in the Carrier Grade edition that uses batched access for joins, as well as a number of other improvements that improve JOIN performance quite a lot (5-10 times for some scenarios)

"I wouldn’t use MySQL Cluster on a system that needs every single piece of information logged to the nth degree, like a financial app, and that if there’s a catastrophe, some data loss is acceptable. "

Heh - actually this does not make sense to me ;) In many if not most scenarios, Cluster is set up to have multiple nodes per node group. As long as at least one node per group is alive, and all node groups are alive, the cluster is alive and the data is available. So, it is in fact more resilient to catastrophe than InnoDB (which is basically lost whenever the disk is broken): if you have say, 4 nodes per node group you need no less than 4 broken machines in the same node group before your cluster is down - and even then you might have lost 2 seconds of data at the most (assuming a default global checkpointing interval)

It is of course possible to build up the node groups in a manner that there is still a single point of failure. For example, if all nodes in one node group are fed of one electricity socket, well, then there is just one socket that can break the cluster.  Of course, you should build it in a way that that can't happen ;)

"As far as I know there’s no way to choose the algorithm, so you can’t partition your data as you might want to in a data warehouse or other scenario. I think that’s why MySQL chooses to use the term “fragment” when they talk about data nodes instead of “partition”."

It's true: in 5.0 you cannot choose partitioning. In 5.1, you can use the PARTITIONING clauses and at least control the amount of partitioning, but I believe HASH and KEY partitioning are still required.

If you have the change, pick up a copy of the MySQL 5.1 Cluster Certification Study Guide. It's got a lot of info that may be of help working with MySQL Cluster. It also covers disk-based data and other 5.1 features.

kind regards,

Roland Bouman</description>
		<content:encoded><![CDATA[<p>Hi Sheeri,</p>
<p>nice write up! A few comments though: </p>
<p>&#8220;In-memory tables use hash tables for storing indexes and data, so exact matches are important, for both writes and reads.&#8221;</p>
<p>Well, it&#8217;s the UNIQUE indexes (and PRIMARY KEY) that use a HASH index. You do have tree-like indexes (T-TREE is the algorithm, it&#8217;s an ordered index like BTREE though) in Cluster for the non-unique indexes. To complicate things, if you create a UNIQUE index (either explicit or implicit by creating a UNIQUE or PRIMARY KEY constraint), in addition to the HASH index, Cluster will automatically create the T-TREE index as well! You can use the USING HASH clause to *prevent* Cluster from automatically creating the T-TREE index.</p>
<p>These T-TREE indexes are used for all range scans, and are quite good for that. The reason that JOIN performance is poor is not so much (or at least not only) caused by the indexes but because of the implementation of the JOIN - It&#8217;s because the MySQL server performs the join and does not batch the requests for the records from the joined table. IOW, it does not batch requests, and instead does a roundtrip for each record.</p>
<p>I believe there is currently a patch in the Carrier Grade edition that uses batched access for joins, as well as a number of other improvements that improve JOIN performance quite a lot (5-10 times for some scenarios)</p>
<p>&#8220;I wouldn’t use MySQL Cluster on a system that needs every single piece of information logged to the nth degree, like a financial app, and that if there’s a catastrophe, some data loss is acceptable. &#8221;</p>
<p>Heh - actually this does not make sense to me ;) In many if not most scenarios, Cluster is set up to have multiple nodes per node group. As long as at least one node per group is alive, and all node groups are alive, the cluster is alive and the data is available. So, it is in fact more resilient to catastrophe than InnoDB (which is basically lost whenever the disk is broken): if you have say, 4 nodes per node group you need no less than 4 broken machines in the same node group before your cluster is down - and even then you might have lost 2 seconds of data at the most (assuming a default global checkpointing interval)</p>
<p>It is of course possible to build up the node groups in a manner that there is still a single point of failure. For example, if all nodes in one node group are fed of one electricity socket, well, then there is just one socket that can break the cluster.  Of course, you should build it in a way that that can&#8217;t happen ;)</p>
<p>&#8220;As far as I know there’s no way to choose the algorithm, so you can’t partition your data as you might want to in a data warehouse or other scenario. I think that’s why MySQL chooses to use the term “fragment” when they talk about data nodes instead of “partition”.&#8221;</p>
<p>It&#8217;s true: in 5.0 you cannot choose partitioning. In 5.1, you can use the PARTITIONING clauses and at least control the amount of partitioning, but I believe HASH and KEY partitioning are still required.</p>
<p>If you have the change, pick up a copy of the MySQL 5.1 Cluster Certification Study Guide. It&#8217;s got a lot of info that may be of help working with MySQL Cluster. It also covers disk-based data and other 5.1 features.</p>
<p>kind regards,</p>
<p>Roland Bouman</p>
]]></content:encoded>
	</item>
</channel>
</rss>
