<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.6.5" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: MySQL Replication Failures</title>
	<link>http://www.pythian.com/blogs/1273/mysql-replication-failures</link>
	<description>News and views from Pythian DBAs</description>
	<pubDate>Tue,  6 Jan 2009 11:55:29 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
		<item>
		<title>By: Diamond Notes &#187; MySQL Replication Failures</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-295981</link>
		<dc:creator>Diamond Notes &#187; MySQL Replication Failures</dc:creator>
		<pubDate>Fri, 31 Oct 2008 17:53:59 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-295981</guid>
		<description>[...] MySQL Replication Failures    October 02nd, 2008 &#124; Category: MySQL, Replication, checksums, maatkit, replication failure, row-based replication, slave drift   Over the weekend, I worked on a client’s two computers, trying to get a slave in sync with the master. It was during this time that I began thinking about: a) how this never should have happened in the first place. b) how “slave drift” could be kept from happening. c) how this is probably keeping some businesses from using MySQL. d) how MySQL DBAs must spend thousands of hours a year wasting time fixing replication issues. I’ll be the first person to tell you that the replication under MySQL is pretty much dead-simple to set up. My only complaint is that it is annoying to type in the two-line “CHANGE MASTER” command to set up a new slave. Even so, it makes sense. It is also very easy, however, for a slave to end up with different data than the master server has. This can be caused by replication bugs, hardware problems, or by using non-deterministic functions. Without proper permissions, a user/developer/DBA can log into the slave server and mess the data up that way. This last is a database administrator problem, but it affects replication. There are probably other issues that astute readers will point out. I would like to point out one common issue that would probably be categorized as a replication bug. If the master crashes for whatever reason (say, a hosting company accidentally punches the power button on a master server) it will often cause corruption of the binary log. When the master comes back up, the slave cries about a non-existent binary log position. Possible solutions: (more…) [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] MySQL Replication Failures    October 02nd, 2008 | Category: MySQL, Replication, checksums, maatkit, replication failure, row-based replication, slave drift   Over the weekend, I worked on a client’s two computers, trying to get a slave in sync with the master. It was during this time that I began thinking about: a) how this never should have happened in the first place. b) how “slave drift” could be kept from happening. c) how this is probably keeping some businesses from using MySQL. d) how MySQL DBAs must spend thousands of hours a year wasting time fixing replication issues. I’ll be the first person to tell you that the replication under MySQL is pretty much dead-simple to set up. My only complaint is that it is annoying to type in the two-line “CHANGE MASTER” command to set up a new slave. Even so, it makes sense. It is also very easy, however, for a slave to end up with different data than the master server has. This can be caused by replication bugs, hardware problems, or by using non-deterministic functions. Without proper permissions, a user/developer/DBA can log into the slave server and mess the data up that way. This last is a database administrator problem, but it affects replication. There are probably other issues that astute readers will point out. I would like to point out one common issue that would probably be categorized as a replication bug. If the master crashes for whatever reason (say, a hosting company accidentally punches the power button on a master server) it will often cause corruption of the binary log. When the master comes back up, the slave cries about a non-existent binary log position. Possible solutions: (more…) [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith Murphy</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285356</link>
		<dc:creator>Keith Murphy</dc:creator>
		<pubDate>Fri, 03 Oct 2008 00:10:18 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285356</guid>
		<description>Hey everyone,
thanks for the input.  Jason, I am aware of Continuent's work. I haven't had a chance to work with it so I don't know if it really resolves any issues. And to me, this is something that should be resolved "in the database".

Keith</description>
		<content:encoded><![CDATA[<p>Hey everyone,<br />
thanks for the input.  Jason, I am aware of Continuent&#8217;s work. I haven&#8217;t had a chance to work with it so I don&#8217;t know if it really resolves any issues. And to me, this is something that should be resolved &#8220;in the database&#8221;.</p>
<p>Keith</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arjen Lentz</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285305</link>
		<dc:creator>Arjen Lentz</dc:creator>
		<pubDate>Thu, 02 Oct 2008 22:32:21 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285305</guid>
		<description>For some additional tools, see http://openquery.com.au/resources/tools/nagios_scripts</description>
		<content:encoded><![CDATA[<p>For some additional tools, see <a href="http://openquery.com.au/resources/tools/nagios_scripts" rel="nofollow">http://openquery.com.au/resources/tools/nagios_scripts</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Holoboff</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285294</link>
		<dc:creator>David Holoboff</dc:creator>
		<pubDate>Thu, 02 Oct 2008 22:09:48 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285294</guid>
		<description>I also did not want to wonder about the status of tables and their data in our replicated setup, so I made a little interface to help me along...  It did the following:

1) compared the number of rows in tables on master and slave (after a flush, I selected from information_schema.TABLES).  This is easily done with MyISAM tables, and thus was performed for these tables nightly.  If values did not match, I then performed a COUNT(*) and this resolved more than half of them (interesting anomoly in an of itself);
2) if number of rows was off by more than one (sometimes replication would be sending over a record for inserting on the slave), then I had a utility that checked for matching values, as long as the table had an auto_increment column for the primary key (there are other solutions if it does not).  This would insert the missing row(s) on the slave or remove (and log) extra rows from the slave;
3) if there was time, I would optimize the table; most often, I did not have time until a planned outage.

Note that I did not do this for InnoDB tables (although possible) and I did not have to worry about ACID compliance with this setup.  This was meant for website data that did not need ACID compliance.

I also would not perform checksum unless in the most extreme of circumstances, as this took a very long time.</description>
		<content:encoded><![CDATA[<p>I also did not want to wonder about the status of tables and their data in our replicated setup, so I made a little interface to help me along&#8230;  It did the following:</p>
<p>1) compared the number of rows in tables on master and slave (after a flush, I selected from information_schema.TABLES).  This is easily done with MyISAM tables, and thus was performed for these tables nightly.  If values did not match, I then performed a COUNT(*) and this resolved more than half of them (interesting anomoly in an of itself);<br />
2) if number of rows was off by more than one (sometimes replication would be sending over a record for inserting on the slave), then I had a utility that checked for matching values, as long as the table had an auto_increment column for the primary key (there are other solutions if it does not).  This would insert the missing row(s) on the slave or remove (and log) extra rows from the slave;<br />
3) if there was time, I would optimize the table; most often, I did not have time until a planned outage.</p>
<p>Note that I did not do this for InnoDB tables (although possible) and I did not have to worry about ACID compliance with this setup.  This was meant for website data that did not need ACID compliance.</p>
<p>I also would not perform checksum unless in the most extreme of circumstances, as this took a very long time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285259</link>
		<dc:creator>Xaprb</dc:creator>
		<pubDate>Thu, 02 Oct 2008 20:15:23 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285259</guid>
		<description>Gerry, I'm not sure that's accurate.  People have been talking about the need for it, yeah, and MySQL agreed, but that doesn't mean it's in the works.</description>
		<content:encoded><![CDATA[<p>Gerry, I&#8217;m not sure that&#8217;s accurate.  People have been talking about the need for it, yeah, and MySQL agreed, but that doesn&#8217;t mean it&#8217;s in the works.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gerry Narvaja</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285219</link>
		<dc:creator>Gerry Narvaja</dc:creator>
		<pubDate>Thu, 02 Oct 2008 18:19:44 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285219</guid>
		<description>The replication 'checksum' patch has been in the works since last UC. Not sure when it will make it out and why it hasn't happened yet.</description>
		<content:encoded><![CDATA[<p>The replication &#8216;checksum&#8217; patch has been in the works since last UC. Not sure when it will make it out and why it hasn&#8217;t happened yet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason</title>
		<link>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285198</link>
		<dc:creator>Jason</dc:creator>
		<pubDate>Thu, 02 Oct 2008 17:11:32 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/1273/mysql-replication-failures#comment-285198</guid>
		<description>The Continuent folks recently released their "Tungsten Stack" that is supposed to make replication more robust:

http://www.continuent.com/index.php?option=com_content&#38;task=view&#38;id=417&#38;Itemid=88

Excerpt:

The initial feature set for Continuent Tungsten Replicator includes:

    * Simple set-up procedure
    * Proper handling of master failover in presence of multiple slaves (for high availability)
    * Master/slave replication of one, some, or all databases
    * MySQL statement replication
    * Checksums on replication events
    * Table consistency check mechanism
    * Runs on Linux, Solaris and Windows.

Cheers</description>
		<content:encoded><![CDATA[<p>The Continuent folks recently released their &#8220;Tungsten Stack&#8221; that is supposed to make replication more robust:</p>
<p><a href="http://www.continuent.com/index.php?option=com_content&amp;task=view&amp;id=417&amp;Itemid=88" rel="nofollow">http://www.continuent.com/index.php?option=com_content&amp;task=view&amp;id=417&amp;Itemid=88</a></p>
<p>Excerpt:</p>
<p>The initial feature set for Continuent Tungsten Replicator includes:</p>
<p>    * Simple set-up procedure<br />
    * Proper handling of master failover in presence of multiple slaves (for high availability)<br />
    * Master/slave replication of one, some, or all databases<br />
    * MySQL statement replication<br />
    * Checksums on replication events<br />
    * Table consistency check mechanism<br />
    * Runs on Linux, Solaris and Windows.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
</channel>
</rss>
