There is a time in every SQL Server DBA career where a mail came in with a “replication is not working, please check” message. This article is intended to provide with quick tips on how to handle common replication errors and performance problems in a one way transactional replication topology
You check replication monitor and get a :
“Transaction sequence number: 0x0003BB0E000001DF000600000000, Command ID: 1”
The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)
Note the sequential number will be used in the following scripts, also the commandID is important to note as not necessarily the whole sequential number has issues, it might be tied to just one command.
Go to the distributor database en run the following command to get the list of articles involved in this issue:
select * from dbo.MSarticles
where article_id in (
select article_id from MSrepl_commands
where xact_seqno = 0x0003BB0E000001DF000600000000)
To get the whole list of commands you can run below query
exec sp_browsereplcmds
@xact_seqno_start = ‘0x0003BB0E000001DF000600000000’,
@xact_seqno_end = ‘0x0003BB0E000001DF000600000000’
With this last query you can get to the exact command that is failing (by searching the command number in the commandID column)
You will notice that a transactional replication will typically(depending on setup) use insert, delete, update stored procedures to replicate the data, so the command you will see over here will look something like:
{CALL [sp_MSdel_dboMyArticle] (118)}
That is the stored procedure generated to process delete statement over dbo.MyArticle table, and in this case it is trying to delete ID 118. Based on the error reported you will now realize that the issue is that the replication is trying to delete MyArtcile on ID 118 and is not there, so it is trying to delete a non existent record.
Options:
After checking the replication monitor you get a message like:
Query timeout expired
The process is running and is waiting for a response from the server
Initializing…
and then terminating with this error…
Agent ‘MyAgent’ is retrying after an error, YY retries attempted
This can be due to several reasons:
When a replication is marked as expired, it will tell you that you need to reinitialize.
To activate it “under the hood”, check your replication monitor last error, it will show you the last sequential number that tried to process, then run this command(using the corresponding seq_no):
update MSsubscriptions
set status=2
where subscription_seqno=0x0002AADE00005030000100000002
The status column means:
0 = Inactive.
1 = Subscribed.
2 = Active.
You can change it to Active and it will try to process again. Why would you use this? if the subscription expired but your distribution cleanup job haven´t run, then it can try to reprocess everything again, if the issue was related to a network time out and now you have your network back up, you can try this as it will try to start from the last sequential number. Also you can try to do this to reproduce the last error reported, so it will fail and eventually expire again but you will have a better idea on why it failed in the first place.
A slow replication, and by slow I mean when you know that your replication is experiencing a delay when your command goes from the distributor to the subscription, you can check this with performance counters or quickly insert a token(https://technet.microsoft.com/en-us/library/ms151846%28v=sql.105%29.aspx)
You can improve the performance by adding streams, normally a default setting will write sequentially the replication transactions one by one, with Streams you can add more threads, say you specify to use 4 strems, you will be processing 4 transactions at a time meaning a faster turnaround. This can work beautifully but it can also generate deadlocks and inconsistencies, I would recommend to start low and just add 1 stream at a time and stop when you start seeing a problem. Do not go crazy and feel this is a turbo button and add 30 streams, and like most features, test it in QA first!
To Enable this option follow these steps:
You might encounter some issues when implementing this, you can read this KB for further info:
https://support.microsoft.com/kb/953199
There are many tips on how to fix a replication, sometimes is easier to just reinitialize, but sometimes this is not an option when critical systems depend on the subscription to be up to date or your database is so huge that it will take days to complete. When possible try to troubleshoot instead of just restarting the replication from scratch as it will give you a lot more insight on what is going on.
Ready to optimize your SQL Server Database for the future?