When SHOW SLAVE STATUS lies

Posted in: Technical Track

Over-the-Top Tales from the Trenches.
Motto: Bringing order to the chaos of every day DBA life.

So you have got your nice MySQL Master-Slave replication pair setup. Everything is sweet, then the master dies/restarts or you have a slightly extended network outage.

Your monitoring software (in our case Avail) fires off a page and you are rudely interrupted from reading the Pythian blog.

These real world interruptions, what can I say… it pays the bills.

Anyway being the rounded DBA or enlightened DBA as Babette would say, you are capable of handling any type of database. You log into the machine and check out why the slave threw an error or if your monitoring is slow, why the slave is lagging by 2 hours.

You run SHOW SLAVE STATUS\G

mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: x.x.x.x
                Master_User: rep
                Master_Port: 3306
              Connect_Retry: 60

Master_Log_File: mysql-bin.000480 Read_Master_Log_Pos: 690470773 Relay_Log_File: db2-relay-bin.000028 Relay_Log_Pos: 683977007 Relay_Master_Log_File: mysql-bin.000480

           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB:
        Replicate_Ignore_DB: avail,avail
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted
                 		(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted
                 		(you can check this by running 'mysqlbinlog' on the relay log), a network problem,
                 		or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log,
                 		you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
               Skip_Counter: 0

Exec_Master_Log_Pos: 126

            Relay_Log_Space: 690471192
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

The bits which are important are in red.

Now the normal procedure for this kind of error is to get the Exec_Master_Log_Pos and tell the slave to restart from there. Using a command like:

change master to Master_Log_File='mysql-bin.000480', Master_Log_Pos=126;
start slave;

In this case you can kiss goodbye to your slave as a consistent copy of your master database. SHOW SLAVE STATUS has basically lied through its teeth. The master_log_pos is not 126.

To quote my mate Yannick:

this is the error log file entries when the problem happened:

080206 16:03:48 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
080206 16:03:48 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log '

mysql-bin.000480′ position 683976897

080206 16:03:48 [Note] Slave: connected to master 'rep@x.x.x.x:3306',replication resumed in log 'mysql-bin.000480' at position 683976897
080206 16:04:31 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 1966107, event_type: 16
080206 16:04:31 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
080206 16:04:31 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted
(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running
'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master'
s binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
080206 16:04:31 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE
START". We stopped at log

‘mysql-bin.000480’ position 126

Comparing both output, “show slave status” tells us that “I’m stuck at …”

Master_Log_File: mysql-bin.000480
Exec_Master_Log_Pos: 126

while the error log file says at first that it is stuck at:

log 'mysql-bin.000480' position 683976897

then later in the error log is says it is stuck at:

mysql-bin.000480' position 126

So If I had to choose I would choose to restart the slave at log ‘mysql-bin.000480’ position 683976897.
If not you are going to try to re-run a full binlog that has been already applied.

Yannick’s Rules for dodgy master log positions when a slave gets stuck on the relay log corruption:

  1. If the position is less then let says 500, suspect that the position is wrong (it might be right but it needs to be checked out).
  2. Always go and check the MySQL error log to see what it has to say about position.
  3. If the position in the error log as well as in the “show slave status” agree more or less (both are a big number), then normally I would pick the one from “show slave status”.
  4. If not, then I would pick the one in the error log.

NOTE: you must understand that it is possible for the slave to be stuck at a small position.
This would happen if the master did change binlog and the slave got stuck just after the switch of the master binlog, so you have to be careful.

Have Fun

Paul

email

Interested in working with Paul? Schedule a tech call.

About the Author

Database Specialist based in Sydney,Australia

8 Comments. Leave new

kamiar kanani
April 7, 2008 3:06 am

I think these are tow different errors, one inside I/O thread and another in sql thread. error related to I/O thread occurred at :
‘mysql-bin.000480′ position 683976897
and error related to SQL thread at :
‘mysql-bin.000480′ position 126

Reply
Harrison Fisk
April 7, 2008 8:35 am

What version was this on?

I recently worked with the replication team to find and fix bugs of this nature. The resulting fixes were pushed into 5.0.54 and 5.1.23, and I haven’t seen one case of SHOW SLAVE STATUS being wrong in this manner since then.

The bugs in question were:

http://bugs.mysql.com/bug.php?id=31190
http://bugs.mysql.com/bug.php?id=12691

Thanks!

Reply

Hi Harrison,

Thanks for the update. This was MySQL 5.0.41

Reply
Log Buffer #92: a Carnival of the Vanities for DBAs
April 11, 2008 11:39 am

[…] Here on the Pythian Group Blog, Paul Moen posted about a situation in which SHOW SLAVE STATUS lies. […]

Reply

A similar issue, but equally nasty:

http://bugs.mysql.com/bug.php?id=26540

If the slave server crashes (not mysqld process) then the syncronisation point will be lost and can not be recoved. This effectively requires the slave to be rebuilt. Very nasty. MySQL SHOULD be flushing the .info files on every commit to ensure consistency.

If not done the difference between the “.info” files and the real committed last transaction can be as long as 30 seconds. When the slave starts up it will try to reapply transactions from 30 seconds previously…

Tell your slaves not to crash and don’t pull the power on them. If they do, you’ll need to rebuild them.

Reply

Hi Paul,
I tried to replication of master-slave of mysql server. 2 problems are there
1. After setting up in master when I run
show master status;
It shows
mysql> show master status;
+——————+———-+————–+——————+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+——————+———-+————–+——————+
| mysql-bin.000001 | 98 | test,test | |
+——————+———-+————–+——————+
1 row in set (0.00 sec)

Why 2 same database names will come in Binlog_Do_DB col, where I was configured only under [mysqld]

log-bin=mysql-bin
binlog-do-db=test
server-id=1

2. In slave server when I call show slave status\G
it shows
**************************
Slave_IO_State:
Master_Host: 192.168.0.50
Master_User: slave_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File:
Read_Master_Log_Pos: 4
Relay_Log_File: mysqld-relay-bin.000001
Relay_Log_Pos: 98
Relay_Master_Log_File:
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 0
Relay_Log_Space: 98
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

Slave_IO_State value is blank. can u please let me know why this happened ? and this is not working at all.
Please revert back with u’r expert comments

Reply

You would need to post the error log after trying to start the slave, not the whole thing. As it will have more detail as to why the slave SQL thread is running and the slave IO thread has stopped.

Reply

Paul, I am using 5.0.45, and have been hunting quite a while for something which properly explained this. You have completely saved me from messing up replication. I followed your example, read my error log carefully, and got everything back up and running. Thank you so very, very much for taking the time to post this.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *