Maatkit Options for Restoring a Slave or Master

Aug 12, 2008 / By Gerry Narvaja

Tags:

The Maatkit toolkit is a real blessing for the MySQL DBA. And while its documentation is pretty good, in some cases it’s necessary to read carefully a second and third time to make sure you are not missing an important piece of information. In this article I will comment on mk-table-chksum and mk-table-sync. My comments are mostly aimed at those DBAs who are considering using these utilities with medium or larger-sized databases.

–replicate

This option allows you to store the checksum results on the master, in a table that will get replicated to the slaves. Although it might seem like overhead for a simple check, it really simplifies your life, especially when used in combination with mk-table-sync. I always use it, for the following reasons:

  • You only need to run mk-table-chksum on the master.
  • A simple query will tell you the slave status.
  • When used with --chunksize, it divides the checking and synching into manageable portions.
  • It’s the best way to get consistent checks between master and slaves.

I always define the table as an InnoDB table to improve concurrency and avoid deadlocks (see the tool documentation for details). This table can be created on any database. Just make sure that it is going to be replicated. Here’s an example of the table definition (straight from the documentation) with the InnoDB specification at the end:

CREATE TABLE checksum (
     db         char(64)     NOT NULL,
     tbl        char(64)     NOT NULL,
     chunk      int          NOT NULL,
     boundaries char(64)     NOT NULL,
     this_crc   char(40)     NOT NULL,
     this_cnt   int          NOT NULL,
     master_crc char(40)         NULL,
     master_cnt int              NULL,
     ts         timestamp    NOT NULL,
     PRIMARY KEY (db, tbl, chunk)
  ) ENGINE = InnoDB;

–chunksize

One of the concerns when checking big tables is the possibility of locking them for a significant time, and thus seriously disrupting the application using the database. Although it is always preferable to run big checks during maintenance windows, that’s not always possible.

My rule of thumb is to try to divide each table in approximately ten chunks. I have also found that chunks larger than 500M rows tend to be long-running. So if you have a concern about performance or concurrency, or both, you will have to find for yourself the right way to size the proper chunks for each table. A colleague of mine came up with the following query to execute on the Information Schema:

SELECT table_schema, table_name, table_rows,
       ROUND(table_rows/11,-5) chunk_size, ROUND(table_rows/(500000 * 10),1) over_threshold_factor
FROM tables
WHERE table_rows > 500000 * 10
ORDER by table_rows;

You can adjust it to suit your needs or local criteria. The advantage of this setting is more apparent when it is used with replicate. In that case, each chunk will be identified by the values of the key used to divide them, making it easier to find and correct errors, whether automatically or manually.

–databases / –tables / –engine

Sometimes a given table and or database might be too large, causing these tests to run for a very long time. In these cases, it’s best to run the utility on a single database and/or table with options tailored to it. For example, use a bigger chunksize — double or triple that used for the rest of the tables. Other options that can be used instead of these are: ignoreengine, ignoredb, and ignoretable. Which one is the most appropriate will depend entirely on each use case.

–print (mk-table-sync only)

Running mk-table-sync with this option will print to stdout the INSERT/UPDATE/DELETE statements that it would apply to the server in question to synchronize it (remember that the utility can be used to restore a master from a slave backup or to bring a slave up to date). This way you can make sure that the operations to be applied are the ones you expected. You can also use the data to sample rows in the table and or databases in question.

Due to issues in dealing with double-wide characters, I wouldn’t use it to apply the changes to the database directly. mysqldump has similar issues.

Putting It All Together

The following is an example taken from real life, run on a table with 2.5 million rows. The database, table, and user names have been modified for privacy.

time perl mk-table-checksum --replicate=test.checksum --algorithm=BIT_XOR 
> localhost --user=mkuser --askpass 
> --databases product --tables damaged_table  
> --chunksize 300000 --emptyrepltbl
Enter password for localhost:
DATABASE TABLE           CHUNK HOST      ENGINE      COUNT                         CHECKSUM TIME WAIT STAT  LAG
product  damaged_table       0 localhost InnoDB     600754 b738a6daf5a741b613248ca2477eedc0   18 NULL NULL NULL
product  damaged_table       1 localhost InnoDB     579800 b00b90ddf83d86718513eb9dad72861d   22 NULL NULL NULL
product  damaged_table       2 localhost InnoDB     540084 4928a70a6c2a7211f59d659e62466342   15 NULL NULL NULL
product  damaged_table       3 localhost InnoDB     512660 be14adf7b411d2b6d432e6af14975b0c   13 NULL NULL NULL
product  damaged_table       4 localhost InnoDB     410632 8757e1a8f39e4a5dddf29095725815f7   14 NULL NULL NULL
product  damaged_table       5 localhost InnoDB     167034 b5ebf8d23ab676c341d4eafd7bab35b9    5 NULL NULL NULL
product  damaged_table       6 localhost InnoDB       6605 af98b69f745c2eb0e81a485bbbea379f    0 NULL NULL NULL

real    1m41.142s
user    0m0.146s
sys     0m0.018s

The corresponding mk-table-sync command would be:

time ./mk-table-sync --print --chunksize 300000 --databases prod1 
--tables deleted_circles --replicate=avail.checksum  
--askpass --verbose --synctomaster u=prod_root,h=localhost

(There’s the nuance of having a different way to specify host and user name for mk-table-sync, but I can live with that.)

Conclusion

These are powerful utilities to keep your MySQL databases in good health. When dealing with medium size and larger installations, however, the default options will not be enough. You will need to factor in table sizes and concurrency issues, as it won’t always be possible to bring the database off-line to perform maintenance, preventive or otherwise.

One Response to “Maatkit Options for Restoring a Slave or Master”

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>