Troubleshooting Transactional Replication Latency Using Agent Statistics
In this blog post, we will look at how you can leverage the Agent statistics to troubleshoot the replication latency issues. Before understanding how to decode the agent statistics, let's take a look at the some of the basic things that will help us troubleshoot the replication performance issue in a better way. The following MSDN diagram depicts the transactional replication architecture in a simple manner. Troubleshooting latency issues is a multi-step approach. The first step is to identify which agent is slow.
- Log Reader Agent (Publisher to Distributor)
- Distribution Agent (Distributor to Subscriber)
Note: Each thread will have their own buffer with 40k in size. Here, we need to look at the wait time to understand where the bottleneck exists.For example, if you notice that the wait time for the Reader thread is high, then it essentially means your Writer thread is slow since the Reader thread is waiting for Writer to release the buffer. Similarly, if you notice high wait times for your Writer thread then your Reader thread is slow. The easiest way to decode this is, HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the commands to distribution database) HIGH Wait time on Writer thread = Reader thread is slow ( thread which scans the transaction log)– State = 1 means stats after batch commit
–Work = cumulative time spent by the agent since restart - idle time ( seconds)
–Idle = Time spent waiting to call sp_replcmds
–Reader fetch = Time to do execute sp_replcmds
Wait = Time spent waiting on writer to release buffer
–Writer write = Time spent writing commands into distribution database
Wait = Time spent waiting on reader to populate buffer
Distribution Agent Statistics
<stats state="1" work="154" idle="351464"> <reader fetch="144" wait="11"/> <writer write="12" wait="338"/> <sincelaststats elapsedtime="305" work="10" cmds="81262" cmdspersec="8041.000000"><reader fetch="0" wait="9"/><writer write="10" wait="0"/></sincelaststats></stats> – State =1 means stats after a batch commit – Work = cumulative time spend by the agent since restart - idle time – Idle = Time spend waiting to call sp_msget_repl_commands – Reader fetch = Time to do execute sp_msget_repl_commands Wait = Time spent waiting on writer to release buffer. – Writer write = Time spend writing commands into distribution database Wait = Time spent waiting on reader to populate buffer. Similar to log reader agent, the decoding of wait time is same way we did for log reader agent. HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the subscriber database using batched RPC Calls) HIGH wait time on Writer thread = Reader thread is slow ( thread which takes the pending commands from Distribution database) Distributor Writer Thread Slow Scenario We would be able to understand this concept better by looking at sample statistics. In the example below, I explicitly started the transaction on Subscriber table to simulate blocking on the subscriber side, making the writer thread of the distribution agent wait and build up latency. This is how the stats looked.- <stats state="1" work="755" idle="354505">
- <reader fetch="153" wait="604"/>
- <writer write="613" wait="346"/>
- <sincelaststats elapsedtime="636" work="515" cmds="45033" cmdspersec="87.000000"><reader fetch="0" wait="515"/><writer write="515" wait="0"/></sincelaststats></stats>
Share this
Previous story
← Duplication: divide et impera
You May Also Like
These Related Stories
Extracting DML Stats/Workload With GoldenGate
Extracting DML Stats/Workload With GoldenGate
Apr 26, 2020
1
min read
Methodology for Snowflake Role-Based Access Control
Methodology for Snowflake Role-Based Access Control
Sep 14, 2021
6
min read
How to Make an In-Database listener.log File
How to Make an In-Database listener.log File
Apr 7, 2008
1
min read
No Comments Yet
Let us know what you think