Online Verification That Master and Slaves are in Sync

Jul 27, 2010 / By Sheeri Cabral

Tags: , ,

In October 2008, Baron posted How to Check MySQL Replication Integrity Continually. Here at Pythian we have developed a method based on that post, and added “verifying that masters and slaves are in sync” to our standard battery of tests.

We call it “Continual replication sync checking”. This article will explain how it works, how to test and make the procedure non-blocking, benchmarks from the real world, issues we encountered along the way, and finally Pythian’s procedure of setting up continual replication sync in a new environment.

At the 2010 MySQL User Conference & Expo, my co-worker Danil Zburivsky did a presentation about this, and you can get the slides in ODP (Open Office) format or watch the 46-minute video on YouTube.

How it works

On the master, mk-table-checksum is run. In order to make the checksum operation online, we use the modulo and offset features of mk-table-checksum to checksum only part of the data at a time. The checksum is run (from cron) on the master and replicates to the slave. The results are captured in a result table, and a separate process checks the result table and notifies us of any discrepancies.

Testing resource usage for non-blocking replication sync checking

Each environment has a different data size, different resources, and a different level of what “intrusive” means. We start with a baseline of the size of the database, which we get from:

SELECT SUM(INDEX_LENGTH+DATA_LENGTH)/1024/1024/1024 as sizeGb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE='BASE TABLE';

I included INDEX_LENGTH because the queries that generate the checksum information may or may not depend on indexes. Note that for InnoDB tables, the metadata is approximate, so this will not necessarily be the exact size of the database, but it will be in the right range. In general, running this query in the INFORMATION_SCHEMA did not consume too many resources, though in many cases the query took a few minutes to complete. On databases with several thousand tables, this query can take hours to complete.

Of course there is the caveat that it may take a VERY long time to run this in your environment, and some folks have reported crashing (this is VERY rare, I personally have not seen it even with the high demand client we have had, but others have reported it). So if you prefer to be on the safe side, you could look at the size of your ibdata files (whether you are innodb_file_per_table or not) plus the sizes of all the tables on disk (MyISAM, CSV, etc). That will give you a sense of how big the data is, although the size of the files on disk also includes any fragmentation. In practice, I have seen fragmentation reduce the size of a large database by 40%, so beware that neither of these methods is perfect.

Once we get the data size in gigabytes, we can come up with a modulo to test. In general, we have found that mk-table-checksum is quick enough that it is non-blocking when we use a modulo value that corresponds to approximately 1 Gb of data, and a chunk-size of 100. We actually started out using a checksum of 1000, but locking 1000 rows to test a checksum proved to be longer than was acceptable for clients with lots of transactions — the problem was not the locking on the master, the problem was that it created a slave lag on the slaves. We tried using the --sleep option to mk-table-checksum, which worked, but changing the chunksize to be smaller caused less slave lag than adding a sleep parameter.

Another issue when using chunk-size is that it requires a numeric (but non-decimal) index. We ran across clients using VARCHAR(32) as a PRIMARY key (yes, on InnoDB, and yes, performance was an issue there), or tables that did not have any indexes at all (such as a logging table). If mk-table-checksum does not find a suitable index it will just do a checksum on the whole table.

To find tables that would be problematic, here is an INFORMATION_SCHEMA query that can be run (again, the usual caveats about INFORMATION_SCHEMA apply):

SELECT CONCAT("SHOW CREATE TABLE ", table_schema,".",table_name,"\G")
FROM TABLES
LEFT JOIN (
     SELECT distinct table_schema,table_name
     FROM STATISTICS INNER JOIN COLUMNS USING
       (table_schema,table_name,column_name)
     WHERE (data_type like '%int' or data_type like 'date%'
     or data_type like 'time%' or data_type='float') and seq_in_index=1
     ) AS good_for_checksum
USING (table_schema,table_name)
WHERE good_for_checksum.table_name IS NULL
and table_schema not in ('information_schema');

Frequency of checksum runs

The frequency that we run the checksum is also very flexible, so we take the size and translate that into a modulo that is “even” in a time-based way. For example, on a server that reported 113 Gb in size from the INFORMATION_SCHEMA query above, we set the modulo to 120. The checksum took 10 minutes and 6 seconds from the time the master started to the time the slave finished. There was no excessive slave lag caused and other resource checks showed that this test was acceptable, including application response time for non-administrative queries.

Given a modulo of 120 that takes about 10 minutes to run and the environment, we decided to run the checksum 9 times per day (hourly for 9 hours during off-peak time). This resulted in the entire data set being checked during a period of just under 2 weeks (120 parts / 9 times per day = 13.333 days).

This means that if there is a data discrepancy, it is discovered within 2 weeks in this particular environment. Though that is not ideal, it is much better than not discovering data discrepancies at all, which is how replication currently works.

Benchmarks From the Real World

The first row in the table below is the example we just went through. The subsequent rows of the table are results from some of the other production environments we run the checksum in. As you can see, we try to keep the data checked at one time to about 1 Gb.

Total Data Size Modulo Test time Checksum Rate Frequency Period
113 Gb 120 10 min 6 seconds 1.59 Mb / sec 9x / day 2 weeks
9 Gb 21 51 seconds 8.4 Mb / sec 3x / day 1 week
29 Gb 21 9 min 16 seconds 2.6 Mb / sec 3x / day 1 week
70 Gb 21 28 seconds 2650 Mb / sec
(data was
freshly defragmented!)
3x / day 1 week
5.1 Gb 21 4 min 22 sec 0.958 Mb / sec 3x / day 1 week
314.5 Gb 336 36 min 3 seconds 0.44 Mb / sec 16x / day 3 weeks

In all of these environments, slave lag was 10 seconds or less at any given point on the slaves.

Issues Encountered

Some of the issues we encountered have workarounds, so I wanted to discuss and explain them here before giving our procedure, which contains the workarounds.

  1. Bug 304 – mk-table-checksum deletes all prior checksum results in the result table on every run. There is no fix for this yet, but if you are using mk-table-checksum only for the procedure described in this article (and in particular are not using the --resume-replication option), you can comment out the following code from mk-table-checksum:

# Clean out the replication table entry for this table.
if ( (my $replicate_table = $final_o->get('replicate'))
&& !$final_o->get('explain') ) {
use_repl_db(%args); # USE the proper replicate db
my $del_sql = "DELETE FROM $replicate_table WHERE db=? AND tbl=?";
MKDEBUG && _d($dbh, $del_sql, $db, $table->{table});
$dbh->do($del_sql, {}, $db, $table->{table});
}

It is in different places in different versions, but last I checked, searching for "DELETE FROM" in mk-table-checksum only matched three lines of code, and it was pretty clear (due to the inline comment) which block of code to delete. The block shown above is from lines 5154 – 5161 in mk-table-checksum changeset 6647.

  • Running the checksum may cause “statement not safe for replication” errors, especially in 5.1.

This is usually OK to ignore, because mk-table-checksum works specifically because you can run the same command on the master and slave and get different results. In MySQL 5.1, CSV tables for the general and slow logs exist by default, even if they are not being written to, and “Statement is not safe to log in statement format” errors show up.

You will need to redirect stderr and expect to see those statements in the MySQL error log. Note that mk-table-checksum works regardless of whether you are using statement-based, row-based or mixed replication.

  • mk-table-checksum is not perfect, and sometimes shows false positives and false negatives.

This is a hard to deal with problem, and we encourage making bug reports when they are found. However, I will note that if mk-table-checksum finds even one undetected data integrity issue, then it is useful, because right now there is no other way of detecting issues in an automated fashion. As more people use mk-table-checksum and can help the developers figure out how to fix the false positives/false negatives, I am sure it will be even better.

Pythian’s procedure to set up continual replication sync

    1. Check to make sure all tables have appropriate indexes, as above

. If they do not, the nibbling algorithm can be used, though as a caveat I have not tested nibbling with mk-table-checksum.

  • Figure out the modulo value based on data size, as above
  • Decide what database in which to put the tables that mk-table-checksum uses. We either use our monitoring database or a database called “maatkit”. Note that it is important to use a database that actually gets replicated!
  • Get mk-table-checksum and comment out the lines that always delete from the replicated table, as above
  • Create and populate the table mk-table-checksum will need for the modulo value:
    CREATE TABLE IF NOT EXISTS `checksum_modulo` (
    `modulo_offset` smallint(5) unsigned NOT NULL default '0' primary key
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT IGNORE INTO checksum_modulo (modulo_offset) VALUES (0);

  • Do a test run of mk-table-checksum:

perl mk-table-checksum -u avail --ask-pass --algorithm BIT_XOR
--replicate maatkit.checksum --create-replicate-table
--modulo 120 --chunk-size 100
--offset 'modulo_offset FROM maatkit.checksum_modulo' localhost

And update the modulo table:
update maatkit.checksum_modulo set modulo_offset = ((modulo_offset+1)%21);

And check the results on the slave:
SELECT * FROM maatkit.checksum
WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

  • If the test run came out OK, set up a script to run via cron such as:

# run checksum table 9 times per day, during off-peak times:
30 0,1,2,3,4,5,21,22,23 * * * /home/pythian/bin/checksumtbls.sh >> /home/pythian/logs/checksum.log 2>&1

And the checksumtbls.sh script looks like:

#!/bin/sh

DB=maatkit
USER=maatkit_user
PASS=password
REPLTBL=”$DB.checksum”
MODULO=120
CHUNKSIZE=100
OFFSETTBL=”checksum_modulo”
ALGORITHM=BIT_XOR
LOG=/home/pythian/logs/checksum.log

/usr/bin/perl /home/pythian/bin/mk-table-checksum -u $USER -p $PASS
–modulo $MODULO –algorithm $ALGORITHM –chunk-size $CHUNKSIZE
–offset “modulo_offset FROM $DB.$OFFSETTBL”
–replicate $REPLTBL –create-replicate-table localhost >> $LOG

/usr/bin/mysql -u $USER -p$PASS -e “update $DB.$OFFSETTBL set modulo_offset=((modulo_offset+1)%$MODULO)” >> $LOG

And of course, do not forget to periodically check on the slave to see where the issues are:

SELECT * FROM maatkit.checksum
WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

I hope this helps; It is extremely important to make sure

3 Responses to “Online Verification That Master and Slaves are in Sync”

  • Mark R says:

    I should warn people who think it’s cool to do SHOW CREATE TABLE on the master & slave, and compare the results, that it doesn’t always return foreign key constraints in the same order.

    It appears that the order FK constraints are returned is nondeterminstic, so your comparison will have to take this into account; I wrote a perl script to (hackily) parse out the constraints, sort them, and append at the end. This means I can still see if they are wrong on the slave, but won’t notice if they are in a different order.

    Finally, if you thought you could run a different version on a slave, forget it. When you run a different MySQL version (even minor), all bets are off, especially in statement-based mode.

    We found that migrating from 4.1.x to 4.1.y that

    * Silent column specification changes happen, or don’t, which makes the schema check fail
    * FLOAT rounds differently.

    Of course these are just the differences we happened to hit.

    You can pretty much forget replicating from 4.x to 5.x and have the data be the same on master & slave – if it even works at all. LOAD DATA crashed the slave in our tests.

    And of course, you can forget data being the same if you use MyISAM, or if your master crashes ever. However, a system such as that described above will detect problems and allow them to be fixed operationally.

    Hopefully in production your master doesn’t crash too often.

  • Mark Callaghan says:

    Master crashes are not a problem with sync_binlog=1 and InnoDB. Slave crashes are the problem. For that you need transactional replication from one of the many InnoDB forks/patches.

    I wrote about a 4.0 to 5.0 upgrade at http://mysqlha.blogspot.com/2008/02/upgrading-from-mysql-40-to-mysql-50.html. We were able to avoid data drift, but that took some work.

    As in most MySQL issues, if you have really high expectations/demands from it, then you need to bring in the consultants or hire them full time.

  • [...] introduction and power, and Part 2 talks about partitioning. Another, explains Pythian’s method for continuously verifying replication sync, and lastly is a farewell post, as Sheeri forks off…. Pythian wishes Sheeri all the best in [...]

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>