When SHOW SLAVE STATUS and the error log Disagree

Or, When MySQL Lies!
When I do a show slave status\G
, sometimes mysqld will lie to me and give me a wrong Exec_Master_Log_Pos
. Let me explain with a situation from last night.
This is the output of show slave status\G
from mysql version 5.0.41-community-log:
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: XXX.XXX.XXX.XXX Master_User: replic_username Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000480 Read_Master_Log_Pos: 690470773 Relay_Log_File: db2-relay-bin.000028 Relay_Log_Pos: 683977007 Relay_Master_Log_File: mysql-bin.000480 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Skip_Counter: 0 Exec_Master_Log_Pos: 126 Relay_Log_Space: 690471192 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL
So in summary, the slave SQL thread is stuck (in this case because of a problem during the transfer of the binlog data to the slave’s relay log). The show slave status\G
command tells me that it is stuck at the master binlog file mysql-bin.000480
, position 126
.
But, if I look at the error log file entries when the slave got stuck I see:
080206 16:03:48 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) 080206 16:03:48 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000480' position 683976897 080206 16:03:48 [Note] Slave: connected to master 'replic_username@XXX.XXX.XXX.XXX:3306',replication resumed in log 'mysql-bin.000480' at position 683976897 080206 16:04:31 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 1966107, event_type: 16 080206 16:04:31 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error 080206 16:04:31 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0 080206 16:04:31 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000480' position 126
I’ll compare both outputs. show slave status
tells me that I’m stuck at:
Relay_Master_Log_File: mysql-bin.000480 Exec_Master_Log_Pos: 126
The error log file says at first that it is stuck at:
'mysql-bin.000480' position 683976897
then later in the error log is says it is stuck at:
'mysql-bin.000480' position 126
So obviously, the last information given in the error log as well as in the show slave status
command is wrong. If I had to choose, I would restart the slave at log mysql-bin.000480
position 683976897
using the command: CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000480', MASTER_LOG_POS=683976897;
. If not, I’ll be going to try to re-run a full binlog that has been already applied.
So, when a slave gets stuck on relay log corruption, my own general rules are as follows:
- If the position is less then, say, 500, suspect that that the position is wrong (it might be right but it needs to be checked out).
- Always go in the error log to see what it has to say about position.
- If the position in the error log as well as in the
show slave status
agree, more-or-less (both are a big number), choose the one fromshow slave status
. Otherwise, pick the one in the error log.
NOTE: Please understand that it is possible for the slave to be stuck at a small position. This would happen if the master did change the binlog and the slave got stuck just after the switch of the master binlog. So be careful.