MySQL: RENAME TABLE on Transactional Tables Can Jeopardize Slave Data

Oct 8, 2008 / By Raj Thukral

Tags:

Do you have a master-slave MySQL set up?  Ever do DDL changes on the master?  You may be hit with a serious data integrity bug.  Read on.

One of our clients does a regular rename tables on the master to keep the current table small and archive off old data.  We’d occasionally be hit by a ‘duplicate key’ error on the slave on the current table and have to resolve it manually.  Digging into the issue, I managed to replicate it on demand and filed bug 39675 with MySQL, which subsequently has been verified and slated for fix, though from what it seems only in version 6.0.  The bug affects all versions of MySQL from 4.1 to 6.0.

In a nutshell, here is what happens. The rename tables command only checks for pending transactions or locks in the current session.  If there is a pending transaction in another session on the table being renamed, the rename will succeed, but the order in which the transaction is written to the binlog will be different from the order in which the transactions were applied on the master.  This means that the data on the slave will now be out of sync for this table.

Here’s a test-case:

CREATE TABLE `raj` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB;

CREATE TABLE `raj_next` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB;

Run the following inserts: insert into raj (b) values (1),(2),(3);

Run the following transaction — but do not commit!

begin;
insert into raj(b) values (4);

Now, in a different session, run this:

rename table raj to raj_backup, raj_next to raj;

Back in the original session, commit it: commit;, and then see that the transaction made it to raj_backup:

select * from raj;
Empty set (0.00 sec)

select * from raj_backup;
mysql> select * From raj_backup;;
+----+------+
| id | b    |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.00 sec)

and on the slave, you will see this:

select * from raj;
+----+------+
| id | b    |
+----+------+
|  4 |    4 |
+----+------+
1 row in set (0.00 sec)

select * from raj_backup;
mysql> select * From raj_backup;;
+----+------+
| id | b    |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
+----+------+
3 rows in set (0.00 sec)

Check the binlog and you will see why this is so:

#080926 11:45:36 server id 3  end_log_pos 2356  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443936/*!*/;
CREATE TABLE `raj` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB/*!*/;
# at 2356
#080926 11:45:36 server id 3  end_log_pos 2550  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443936/*!*/;
CREATE TABLE `raj_next` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB/*!*/;
# at 2550
#080926 11:45:40 server id 3  end_log_pos 2578  Intvar
SET INSERT_ID=1/*!*/;
# at 2578
#080926 11:45:40 server id 3  end_log_pos 2679  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443940/*!*/;
insert into raj (b) values (1),(2),(3)/*!*/;
# at 2679
#080926 11:45:40 server id 3  end_log_pos 2706  Xid = 74
COMMIT/*!*/;
# at 2706
#080926 11:45:51 server id 3  end_log_pos 2816  Query   thread_id=4
exec_time=0     error_code=0
SET TIMESTAMP=1222443951/*!*/;
rename table raj to raj_backup, raj_next to raj/*!*/;
# at 2816
#080926 11:45:59 server id 3  end_log_pos 2884  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443959/*!*/;
BEGIN/*!*/;
# at 2884
#080926 11:45:46 server id 3  end_log_pos 2912  Intvar
SET INSERT_ID=4/*!*/;
# at 2912
#080926 11:45:46 server id 3  end_log_pos 3004  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443946/*!*/;
insert into raj(b) values (4)/*!*/;
# at 3004
#080926 11:45:59 server id 3  end_log_pos 3031  Xid = 76
COMMIT/*!*/;
DELIMITER ;

The row with the insert_id of 4 was committed after the rename.  While this made the row go into the old table on the master, it got written to the binlog after the rename and hence goes into the new table on the slave.

MySQL has acknowledged and verified the bug, but it isn’t slated for a fix any time soon, since the worklist is classified as ‘medium’ priority even though the bug itself is classified as severity S1 (critical), Triage: D2 (Serious) needs triage. http://bugs.mysql.com/bug.php?id=39675

and the worklist entry (http://forge.mysql.com/worklog/task.php?id=4284).

Split from WL#3726.
Implement transaction-long metadata locks. I.e. if a table is used in a transaction, a metadata lock should be kept for this table until the transaction is committed. This will prevent concurrent transactions from issuing a DDL against this table. Necessary to fix BUG#989 If DROP TABLE while there’s an active transaction, wrong binlog order

I don’t know if there is a workaround, but for now, it may make sense to review your code to make sure you don’t have an implementation that may be affected.

2 Responses to “MySQL: RENAME TABLE on Transactional Tables Can Jeopardize Slave Data”

  • I don’t know if this is the kind of work-around you’re looking for, but I used to have a similar problem with keeping my archive tables small. The solution is to use partitioning and drop older partitions instead.

    Maybe your client isn’t on 5.1 yet, but that’s going to be GA a lot sooner than 6.0.

    You can see my article about pruning archive tables with partitions here: http://dev.mysql.com/tech-resources/articles/partitioning-event_scheduler.html

  • Rob Wultsch says:

    Tangential:
    “If MySQL encounters any errors in a multiple-table rename, it does a reverse rename for all renamed tables to return everything to its original state. ”

    I notice that earlier this week and have been curious about what happens if the reverse rename fails. I bet it is nothing good. I’ll write up a test case over the weekend…

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>