Beware Starting Slaves in the Position in the master.info file
I’ve seen many a good DBA make the master of starting slaves from the position in the master.info file, most recently this week, that I want to bring it to everyone’s attention. Of course I mean the underlying issue and not the names of the DBA because that would be cruel.
In the typical scenario where this is an issue, the sequence of events is roughly the same with some small variation. A cold backup or a snapshot is restored onto a new server to build out a new slave. The binary log position from the master.info file, which is part of the backup, is used to start replication. Eventually after a short while, someone notices data discrepancies on the new slave compared to the master or replication stops due to an error.
The problem can be best looked by looking the slave status output in MySQL like below:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: msandbox
Master_Port: 26768
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 4723
Relay_Log_File: mysql_sandbox26769-relay-bin.000002
Relay_Log_Pos: 874
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 729
Relay_Log_Space: 1042
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
mysql>
There are two sets of coordinates for replication in consideration here, the Execute Master Position and the Read Master Position. The Execute Master Position, denoted by the ‘Relay_Master_Log_File’ and Exec_Master_Log_Pos’ in the output above, is the position in the master’s binary log of the current statement being executed by the slave from the relay log. The Read Master Position, denoted by the ‘Master_Log_File’ and Exec_Master_Log_Pos’ in the output above, is the position in the master’s binary logs which the slave is reading from and writing to the relay log. So if the slave is lagged for some reason, which can happen often in the MySQL world, the Read Master Position will be different and in fact newer then the Execute Master Position. On the file system, the Read Master Position is stored in the master.info file and the Execute Master Position is stored instead in the relay-log.info file.
So if the slave was lagged when it was shut down for the cold backup or when the snapshot was taken, then the two positions would be different. Consider the following example from a server which was lagged behind.
The master.info file shows
--(Wed:20110112:1527)-(0:$)-- more master.info 15 mysql-bin.000001 4723 127.0.0.1 msandbox msandbox 26768 60 0 0 --(wang@polygon)-(/home/wang/sandboxes/rsandbox_5_1_53/node1/data)--
The relay-log.info file shows
--(Wed:20110112:1527)-(0:$)-- more relay-log.info ./mysql_sandbox26769-relay-bin.000002 874 mysql-bin.000001 729 --(wang@polygon)-(/home/wang/sandboxes/rsandbox_5_1_53/node1/data)--
When the replication is started on the new server from the position in the master.info file, the statements in the binary logs between the Execute Master Position and the Read Master Position get skipped and cause data corruption.
In the end, the solution is to always start a new slave using the position in the relay-log.info file.
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
When SHOW SLAVE STATUS and the error log Disagree
When SHOW SLAVE STATUS lies
Replication Issues: Never purge logs before slave catches them!!
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.