Posted by Mohammed Mawla on Mar 12, 2009
In SQL Server Replication, a regular check is getting the number of undistributed commands for a transactional subscription. This can be done using the replication monitor (SQlmonitor.exe from the RUN prompt) or programmatically by running sp_replmonitorsubscriptionpendingcmds at the Distributor on the distribution database.
One issue with sp_replmonitorsubscriptionpendingcmds is that you will have to submit a few parameters without defaults (the required parameters are mentioned in the above link).
I wanted once to get status information about all subscriptions (Last logged message, status, agent, etc.), plus the number of undistributed commands for each subscription. For the replication info there is the MSdistribution_agents table, which is stored in the Distribution database and contains one row for each Distribution Agent. For the Agents’ history there is the MSdistribution_history table, which contains the history for Distribution agents, and is also stored in the Distribution database.
To return this, my idea was to get a list of publishers, publications, subscribers, and so on, into a temporary table, loop against the result, and run sp_replmonitorsubscriptionpendingcmds for each row, and then join against MSdistribution_history for the final result.
Here it is. All comments and additions are welcome.
Read the rest of this entry . . .
Posted by Sheeri Cabral on Feb 19, 2009
Slaves can be used for:
- Horizontal read scalability — take the load off a master database by spreading reads to a replicated slave.
- Disaster recovery — some disasters, such as a hardware failure, can be solved by having a slave ready to propagate to a master. This technique also works to make offline changes to a master/slave pair without having database downtime (see below).
- Consistent Backups — without disrupting production usage, a slave can be used to take a consistent backup by stopping the database and copying the database files. This is a free and uncomplicated way to get consistent backups (innodb hot backup is not free, and using a snapshotting tool (such as LVM’s snapshot capability) can be complex. Not everyone wants to manage snapshots.)
Be careful when using a slave for more than one purpose. Using a slave for more than one purpose can be done, but carefully. For example:
If one slave instance serves the needs of 1 and 2 — Where do the reads go when the slave is promoted to a master (or taken offline for maintenance)?
If one slave instance serves the needs of 1 and 3 — can the read slave be down for the length of time it takes to take a backup? If not, this solution is not appropriate.
If one slave instance serves the needs of 2 and 3 — Where does the backup get taken from when the slave is promoted to master (or taken offline for maintenance)?
There are certainly workarounds. We either create a propagation procedure (whether for maintenance or in an emergency) or assess an existing one, to cover these cases to make sure any dual-use of a slave does not leave a client hanging. We regularly test whether slaves are in sync, to ensure accuracy no matter what purposes replication slaves are used for. We also go through actual slave-to-master propagations. This can be useful as a drill, and can also be useful to avoid downtime.
Avoiding Downtime with Replication
In many cases, using replication can avoid downtime during an offline database change (for example, an OPTIMIZE TABLE to defragment a table). To implement this, the slave is taken out of service, the change performed on the slave, the slave is put back into service and propagated to be the master. Then the previous master is taken offline, the change applied to the previous master, and the previous master can return to service (either as a slave of the current master, or by propagating the previous master to be the current master.
Posted by Gerry Narvaja on Jan 27, 2009
One very helpful use of the technique Sheeri described in Remote connections without leaving the mysql shell is making sure that replication is working properly.
According to the MySQL Reference Manual’s section on SHOW SLAVE STATUS Syntax, it shows information corresponding to the slave thread in the slave server. When replication is broken, however, or not working properly due to network issues between master and slave, this information may not be accurate. This has improved over recent releases, but it’s still not perfect.
The question, then, is: how to be 100% sure (or as close as you can get to 100%) that replication is running fine? The answer, as offered by Sheeri: use CONNECT.
Example
- Log into the slave using the mysql client and issue
SHOW SLAVE STATUS:
Read the rest of this entry . . .
Posted by Sheeri Cabral on Oct 12, 2008
Recently I had an interesting issue crop up. Due to an unfortunate migration incident in which involved master/master replication and not checking to see if replication was caught up, we ended up with an infinite replication loop of a number of SQL statements. awk helped immensely in the aftermath cleanup.
The basics of the replication infinite loop were Read the rest of this entry . . .
Posted by Sheeri Cabral on Oct 9, 2008
This is a post about SYSDATE() and NOW() and CURRENT_TIMESTAMP() functions in MySQL.
Firstly, note is that of these three, only CURRENT_TIMESTAMP() is part of the SQL Standard. NOW() happens to be an alias for CURRENT_TIMESTAMP() in MySQL.
Secondly, note that replication does not work well with non-deterministic functions. And “hey, what time is it?” is non-deterministic. Ask it twice, with a second apart between asking, and both times you get different results (with at least second precision).
You can start to see the problem here….but there’s more…. Read the rest of this entry . . .
Posted by Raj Thukral on Oct 8, 2008
Do you have a master-slave MySQL set up? Ever do DDL changes on the master? You may be hit with a serious data integrity bug. Read on.
One of our clients does a regular rename tables on the master to keep the current table small and archive off old data. We’d occasionally be hit by a ‘duplicate key’ error on the slave on the current table and have to resolve it manually. Digging into the issue, I managed to replicate it on demand and filed bug 39675 with MySQL, which subsequently has been verified and slated for fix, though from what it seems only in version 6.0. The bug affects all versions of MySQL from 4.1 to 6.0.
In a nutshell, here is what happens. The rename tables command only checks for pending transactions or locks in the current session. If there is a pending transaction in another session on the table being renamed, the rename will succeed, but the order in which the transaction is written to the binlog will be different from the order in which the transactions were applied on the master. This means that the data on the slave will now be out of sync for this table.
Here’s a test-case:
Read the rest of this entry . . .
Posted by Keith Murphy on Oct 2, 2008
Over the weekend, I worked on a client’s two computers, trying to get a slave in sync with the master. It was during this time that I began thinking about:
a) how this never should have happened in the first place.
b) how “slave drift” could be kept from happening.
c) how this is probably keeping some businesses from using MySQL.
d) how MySQL DBAs must spend thousands of hours a year wasting time fixing replication issues.
I’ll be the first person to tell you that the replication under MySQL is pretty much dead-simple to set up. My only complaint is that it is annoying to type in the two-line “CHANGE MASTER” command to set up a new slave. Even so, it makes sense.
It is also very easy, however, for a slave to end up with different data than the master server has. This can be caused by replication bugs, hardware problems, or by using non-deterministic functions. Without proper permissions, a user/developer/DBA can log into the slave server and mess the data up that way. This last is a database administrator problem, but it affects replication. There are probably other issues that astute readers will point out.
I would like to point out one common issue that would probably be categorized as a replication bug. If the master crashes for whatever reason (say, a hosting company accidentally punches the power button on a master server) it will often cause corruption of the binary log. When the master comes back up, the slave cries about a non-existent binary log position.
Possible solutions: Read the rest of this entry . . .
Posted by Michelle Gutzait on Aug 11, 2008
I am often asked what ways are there to minimize downtime when upgrading from SQL Server 2000 to SQL Server 2005, or when moving databases to a new (probably bigger and stronger) environment.
Well, if you can afford having both–old environment and new environment in parallel–this task can be very easy and straightforward . . .
Options
First, let’s see what are the options to copy a database from one server to the other.
- Detaching and re-attaching the database.
- Using backup and restore.
- Using the database copy wizard.
- Manually (creating schemas + transferring data), probably by using DTS/SSIS packages.
- And here is another cute method: use the snapshot creation and delivery part of the replication.
I am not fond of the third method — the wizard never works very well for me. The fourth method? — nah, I am too lazy. Attaching and detaching database files? — the original database goes offline until the files are moved to the new location; it’s not what we want.
If the database is small to medium size, the replication snapshot can be used. If the database is larger, backing and restoring the databases might be the best solution. In these methods, the original database can stay on line while the secondary database is being built.
If you have large databases
What are the tasks that usually take the most time when moving large databases using backup and restore?
Read the rest of this entry . . .
Posted by Paul Moen on Apr 6, 2008
Over-the-Top Tales from the Trenches.
Motto: Bringing order to the chaos of every day DBA life.
So you have got your nice MySQL Master-Slave replication pair setup. Everything is sweet, then the master dies/restarts or you have a slightly extended network outage.
Your monitoring software (in our case Avail) fires off a page and you are rudely interrupted from reading the Pythian blog.
These real world interruptions, what can I say… it pays the bills.
Anyway being the rounded DBA or enlightened DBA as Babette would say, you are capable of handling any type of database. You log into the machine and check out why the slave threw an error or if your monitoring is slow, why the slave is lagging by 2 hours.
You run SHOW SLAVE STATUS\G
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: x.x.x.x
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000480
Read_Master_Log_Pos: 690470773
Relay_Log_File: db2-relay-bin.000028
Relay_Log_Pos: 683977007
Relay_Master_Log_File: mysql-bin.000480
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: avail,avail
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted
(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted
(you can check this by running 'mysqlbinlog' on the relay log), a network problem,
or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log,
you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 126
Relay_Log_Space: 690471192
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
1 row in set (0.00 sec)
The bits which are important are in red.
Now the normal procedure for this kind of error is to get the Exec_Master_Log_Pos and tell the slave to restart from there. Using a command like: Read the rest of this entry . . .