Oracle Standby Recovery Rate Monitoring

Jan 24, 2008 / By Paul Moen

Tags: ,

So you have created your standby database using the RMAN DUPLICATE command, you have set the ARCHIVE_LAG_TARGET to maintain a minimum lag target, and you have sorted out those nasty datafile missing errors using automatic file management. You’ve even added standby redo logs to improve the Mean Time To Recovery (MTTR).

Now management are demanding to know why the standby periodically lags during the day. They have you on the phone and they are asking what the current redo apply rate is.

Luckily, you read the Pythian blog regularly, or like most people google for a solution or script to help out so you don’t reinvent the wheel.

Management’s first question was, does the standby redo apply rate lag during the day and if so, when. The following SQL script joins v$archived_log to v$dataguard_status, which contains the most recent messages from Data Guard. We join to that table to get the timestamp for when a specific archive log started to be applied.

rem Reports standby apply rate with lag
rem
select TIMESTAMP,completion_time "ArchTime",
SEQUENCE#,round((blocks*block_size)/(1024*1024),1) "SizeM",
round((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60,1) "Diff(sec)",
round((blocks*block_size)/1024/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),1) "KB/sec",
round((blocks*block_size)/(1024*1024)/ decode(((TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP)
OVER (order by TIMESTAMP))*24*60*60),0,1,
(TIMESTAMP-lag(TIMESTAMP,1,TIMESTAMP) OVER (order by TIMESTAMP))*24*60*60),3) "MB/sec",
round(((lead(TIMESTAMP,1,TIMESTAMP) over (order by TIMESTAMP))-completion_time)*24*60*60,1) "Lag(sec)"
from v$archived_log a, v$dataguard_status dgs
where a.name = replace(dgs.MESSAGE,'Media Recovery Log ','')
and dgs.FACILITY = 'Log Apply Services'
order by TIMESTAMP desc;

The script reports the time it took to apply the log, the size of the log, and the redo apply rate for that log.

  • Diff(sec) reports the actual time difference between redo logs applied on the standby.
  • Lag(sec) reports the time difference between when the archive log was completed on the primary and when it was applied on the standby.

Management’s second question was, how fast is the current archived redo log being applied? The shell script mrp-recovery-rate.sh answers that question.

Basically it gets the log_block_size and polls v$managed_standby twice with a 30-second delay. It then returns the redo apply rate in KB/second, MB/second, and the raw bytes/second.

Note: This is a beta version — it does not handle an application rate faster than 30-seconds per log.

This script was developed on the back of the best practices document: MAA – Data Guard Redo Apply and Media Recovery Best Practices 10gR1 (PDF). Oracle provided the formula, we are providing the script.

References

Have Fun!
Paul.

#!/bin/bash
# mrp-recovery-rate.sh
# Created: 2007-10-10
#
LOG_BLOCK_SIZE=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
SELECT LEBSZ FROM X\\$KCCLE WHERE ROWNUM=1;
exit;
END`

BLOCK_BEG=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0';
exit;
END`

TIME_BEG=`date +%s`
echo $BLOCK_BEG

sleep 30

BLOCK_END=`$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" <<END
set pages 0 trimsp on feed off timing off time off
select BLOCK# from V\\$MANAGED_STANDBY where PROCESS='MRP0';
exit;
END`

TIME_END=`date +%s`
echo $BLOCK_END

DIFF_TIME=`expr $TIME_END - $TIME_BEG`
DIFF_BLOCKS=`expr $BLOCK_END - $BLOCK_BEG`
DIFF_SIZE=`expr $DIFF_BLOCKS \* $LOG_BLOCK_SIZE`
DIFF_SIZE_TIME=`expr $DIFF_SIZE / $DIFF_TIME`
MB=`expr 1024 \* 1024`
RECOVERY_RATE=`expr $DIFF_SIZE_TIME / 1024`
RECOVERY_RATE_MB=`expr $DIFF_SIZE_TIME / $MB`

echo $LOG_BLOCK_SIZE
echo $DIFF_TIME
echo $RECOVERY_RATE
echo $RECOVERY_RATE_MB

6 Responses to “Oracle Standby Recovery Rate Monitoring”

  • jason arneil says:

    I think this is a great way of seeing how well your standby is keeping up-to-date and how it has done historically. I use v$dataguard_stats to tell me how the standby is doing, and monitor this, which shows the apply lag and the trasnport lag.

    I should point out (though you probably are aware) the sql script does not work with real time apply.

  • paulm says:

    Hi Jason,

    Thanks for the comment.

    Unfortunately the view v$dataguard_stats shows real-time/most recent stuff and it is only available from 10G onwards. Nice way to track it though. Maybe Oracle will add a history table in the next release of 11G perhaps.

    Yep the SQL also doesn’t work for manual redo log apply. For example if you are using your own scripts to apply redo logs on a standby database (anything not Enterprise Edition).

    Have Fun

    Paul

  • […] Paul Moen from Pythian wrote about a script he uses to keep a historical record of the apply rate. However if you want something more immediate you can maybe use a view called V$STANDBY_APPLY_SNAPSHOT. There does not appear to be much information out there on this view, and it was only recently that I stumbled across it. This is not documented in the Oracle documentation  in either the 10g or 11g docs. There is also nothing about this view on metalink as a search turned up no hits. The view gets a brief mention in a talk by Dataguard product manager, Larry Carpenter. […]

  • paulm says:

    If you want the link to the Carpenter presentation. It is on Oracle’s website.

    http://www.oracle.com/technology/deploy/availability/pdf/S938_Carpenter.ppt.pdf

  • […] has been discussed before by my colleague Paul Moen in his article on Oracle standby recovery rate monitoring, but I’ve made a recent discovery that makes it easier to generate both statistics on log […]

  • kalyan chakravarthy says:

    Hi,

    I think scrpt “Reports standby apply rate with lag” shows the completed time on standby not production , can i get a script that shows the time difference between when the archive log was completed on the primary and when it was applied on the standby

    Regards
    Kalyan

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>