When SHOW SLAVE STATUS and the error log Disagree

Apr 25, 2008 / By Yannick Gratton

Tags: ,

Or, When MySQL Lies!

When I do a show slave status\G, sometimes mysqld will lie to me and give me a wrong Exec_Master_Log_Pos. Let me explain with a situation from last night.

This is the output of show slave status\G from mysql version 5.0.41-community-log:

mysql> show slave status \G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: XXX.XXX.XXX.XXX
                   Master_User: replic_username
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000480
           Read_Master_Log_Pos: 690470773
                Relay_Log_File: db2-relay-bin.000028
                 Relay_Log_Pos: 683977007
      Relay_Master_Log_File: mysql-bin.000480
              Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB:
           Replicate_Ignore_DB: 
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 126
               Relay_Log_Space: 690471192
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: NULL

So in summary, the slave SQL thread is stuck (in this case because of a problem during the transfer of the binlog data to the slave’s relay log). The show slave status\G command tells me that it is stuck at the master binlog file mysql-bin.000480, position 126.

But, if I look at the error log file entries when the slave got stuck I see:

080206 16:03:48 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
080206 16:03:48 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000480' position 683976897
080206 16:03:48 [Note] Slave: connected to master 'replic_username@XXX.XXX.XXX.XXX:3306',replication resumed in log  'mysql-bin.000480' at position 683976897
080206 16:04:31 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 1966107, event_type: 16
080206 16:04:31 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
080206 16:04:31 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
080206 16:04:31 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000480' position 126

I’ll compare both outputs. show slave status tells me that I’m stuck at:

Relay_Master_Log_File: mysql-bin.000480
Exec_Master_Log_Pos: 126

The error log file says at first that it is stuck at:

'mysql-bin.000480' position 683976897

then later in the error log is says it is stuck at:

'mysql-bin.000480' position 126

So obviously, the last information given in the error log as well as in the show slave status command is wrong. If I had to choose, I would restart the slave at log mysql-bin.000480 position 683976897 using the command: CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000480', MASTER_LOG_POS=683976897;. If not, I’ll be going to try to re-run a full binlog that has been already applied.

So, when a slave gets stuck on relay log corruption, my own general rules are as follows:

  1. If the position is less then, say, 500, suspect that that the position is wrong (it might be right but it needs to be checked out).
  2. Always go in the error log to see what it has to say about position.
  3. If the position in the error log as well as in the show slave status agree, more-or-less (both are a big number), choose the one from show slave status. Otherwise, pick the one in the error log.

NOTE: Please understand that it is possible for the slave to be stuck at a small position. This would happen if the master did change the binlog and the slave got stuck just after the switch of the master binlog. So be careful.

3 Responses to “When SHOW SLAVE STATUS and the error log Disagree”

  • Xaprb says:

    There are three sets of file/position coordinates in SHOW SLAVE STATUS:

    1) The position, ON THE MASTER, from which the I/O thread is reading: Master_Log_File/Read_Master_Log_Pos.

    2) The position, IN THE RELAY LOGS, at which the SQL thread is executing: Relay_Log_File/Relay_Log_Pos

    3) The position, ON THE MASTER, at which the SQL thread is executing: Relay_Master_Log_File/Exec_Master_Log_Pos

    You are mixing and matching from different sets, and you can’t do that. In many cases they’ll have the same thing and it doesn’t matter. But not in the general case.

  • gratton says:

    Thanks Xaprb! I had highlighted the wrong line. It has been fixed.
    Thanks again for catching it!

  • Venkatesh says:

    Hi,

    What if “exec_master_log_pos” value is more than “read_master_log_pos” value? The situation is that replication is broken because of a missing table at slave end that needs to be updated.

    What does this mean?

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>