Mining the AWR to Identify Performance Trends

Oct 31, 2013 / By Maris Elsins

Tags: , , , , , ,

Sometimes it’s useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I’ve been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:

  • How does the performance of a particular SQL change over time?
  • How do wait times of a particular wait event change over time?
  • How does a particular statistic change over time?


Please note, the scripts provided here require diagnostic pack licenses and it’s your task to make sure you have them before running the scripts.

SQL performance

I use script awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:

  1. SQL ID
  2. Days to report. It will summarize all AWR snapshots starting with “trunc(sysdate)-{days to report}+1″, so if you pass “1”, it will summarize all snapshots from today, if “2” – than it’s yesterday and today are included.
  3. Interval in hours. “24” will provide one row for each day, “6” will give 4 rows a day.

Nothing shows it better than an example. Below you see how I’m checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I’m taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval.


TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
16.10.2013 00:00:00         351              195.571           74.995           .097           .000           .000           .000           134417.570      21319182.291        293731.556          304434.305
18.10.2013 00:00:00         364               91.225           47.474          1.687           .000           .000           .002           141140.228      20364053.544        270107.745          273343.709
20.10.2013 00:00:00         542               20.686            9.378           .004           .000           .000           .000           146436.875       4597922.220             3.168                .000
22.10.2013 00:00:00         531               25.060           12.086           .161           .000           .000           .000           146476.605       6026729.224         23999.684           23998.859
24.10.2013 00:00:00         542               51.611           40.675          1.880           .000           .000           .000           146814.220      21620264.039        287994.862          287994.701
26.10.2013 00:00:00         534               39.949           26.688          1.050           .000           .000           .000           147099.275      14081016.607        159704.463          159704.418
28.10.2013 00:00:00         245               37.837           29.384          1.150           .000           .000           .000           147135.216      15505533.959        179244.437          179244.367

7 rows selected.

 


TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
26.10.2013 00:00:00          72               19.209            9.439           .000           .000           .000           .000           147076.000       4623816.597              .111                .000
26.10.2013 06:00:00          72               15.391            9.401           .000           .000           .000           .000           147086.403       4624153.819              .000                .000
26.10.2013 12:00:00          72               14.022            9.351           .000           .000           .000           .000           147099.000       4624579.639              .000                .000
26.10.2013 18:00:00          55               48.174           35.723          1.575           .000           .000           .000           147099.000      19192781.582        243584.055          243584.055
27.10.2013 00:00:00          72               76.723           43.350          2.116           .000           .000           .000           147099.000      23258326.875        314445.111          314445.111
27.10.2013 06:00:00          72               64.921           43.914          2.084           .000           .000           .000           147107.542      23258506.028        315673.000          315673.000
27.10.2013 12:00:00          72               52.567           43.383          2.041           .000           .000           .000           147116.000      23258739.403        315673.000          315673.000
27.10.2013 18:00:00          47               25.522           18.095           .523           .000           .000           .000           147117.532       9382873.851         80597.702           80597.362
28.10.2013 00:00:00          65               17.645            9.384           .000           .000           .000           .000           147120.000       4625354.262              .000                .000
28.10.2013 06:00:00          19               17.571            9.451           .000           .000           .000           .000           147122.421       4625411.263              .000                .000
28.10.2013 12:00:00           6               14.083            9.645           .000           .000           .000           .000           147208.167       4629315.167              .000                .000
28.10.2013 18:00:00          48               42.173           35.208          1.509           .000           .000           .000           147236.375      18606643.833        229433.750          229433.750
29.10.2013 00:00:00          72               53.015           43.517          2.022           .000           .000           .000           147245.125      23265547.847        314507.319          314507.083
29.10.2013 06:00:00          30               52.181           43.638          1.932           .000           .000           .000           147250.300      23265839.767        303949.000          303949.000
29.10.2013 12:00:00           5               59.576           43.836          1.177           .000           .000           .000           144049.800      23267109.200        227814.000          227814.000

15 rows selected.

I’ve checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn’t change – it’s always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan.
Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?

Wait event performance

Script awr_wait_trend.sql can be used to  show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT.

I typically use this script in two situations:

  • To check if a particular wait event performs worse when an overall performance problem is reported (usually I’m looking at IO events)
  • Illustrate how the implemented change improved the situation.

The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files.


TIME                EVENT_NAME                       TOTAL_WAITS   TOTAL_TIME_S    AVG_TIME_MS
------------------- ---------------------------- --------------- -------------- --------------
09.10.2013 00:00:00 log file parallel write              4006177      31667.591          7.905
10.10.2013 00:00:00 log file parallel write              3625342      28296.640          7.805
11.10.2013 00:00:00 log file parallel write              3483249      31032.324          8.909
12.10.2013 00:00:00 log file parallel write              3293462      33351.490         10.127
13.10.2013 00:00:00 log file parallel write              2871091      36413.925         12.683
14.10.2013 00:00:00 log file parallel write              3763916      30262.718          8.040
15.10.2013 00:00:00 log file parallel write              3018760      28262.172          9.362
16.10.2013 00:00:00 log file parallel write              3303205      31062.276          9.404
17.10.2013 00:00:00 log file parallel write              3012105      31831.491         10.568
18.10.2013 00:00:00 log file parallel write              2692697      26981.966         10.020
19.10.2013 00:00:00 log file parallel write              1038399        512.950           .494
20.10.2013 00:00:00 log file parallel write               959443        427.554           .446
21.10.2013 00:00:00 log file parallel write              1520444        606.580           .399
22.10.2013 00:00:00 log file parallel write              1618490        655.873           .405
23.10.2013 00:00:00 log file parallel write              1889845        751.216           .398
24.10.2013 00:00:00 log file parallel write              1957384        760.656           .389
25.10.2013 00:00:00 log file parallel write              2204260        853.691           .387
26.10.2013 00:00:00 log file parallel write              2205783        856.731           .388
27.10.2013 00:00:00 log file parallel write              2033199        785.785           .386
28.10.2013 00:00:00 log file parallel write              2439092        923.368           .379
29.10.2013 00:00:00 log file parallel write              2233614        840.628           .376

21 rows selected.

Visualizing the data from output like that is easy too!

Creating a graph has never been easier

System Statistics

The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again – the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there’s huge number of statistics available (638 different statistics in 11.2.0.3) and that’s why I’m sure you’ll find your own reasons to use this script.


TIME                STAT_NAME                             VALUE
------------------- ------------------------- -----------------
27.10.2013 00:00:00 redo size                        1739466208
27.10.2013 04:00:00 redo size                        2809857936
27.10.2013 08:00:00 redo size                         648511376
27.10.2013 12:00:00 redo size                         533287888
27.10.2013 16:00:00 redo size                         704832684
27.10.2013 20:00:00 redo size                         819854908
28.10.2013 00:00:00 redo size                        2226799060
28.10.2013 04:00:00 redo size                        3875182764
28.10.2013 08:00:00 redo size                        1968024072
28.10.2013 12:00:00 redo size                        1125339352
28.10.2013 16:00:00 redo size                        1067175300
28.10.2013 20:00:00 redo size                         936404908
29.10.2013 00:00:00 redo size                        1758952428
29.10.2013 04:00:00 redo size                        3949193948
29.10.2013 08:00:00 redo size                        1715444632
29.10.2013 12:00:00 redo size                        1008385144
29.10.2013 16:00:00 redo size                         544946804

17 rows selected.
redo size

AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too!
P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you’ll find new versions for the script.


Update (4-Nov-2013)

I’ve added the instance numbers to the outputs in all three scripts. This is how it looks now:


 INST TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
    1 28.10.2013 00:00:00         840                 .611             .014           .595           .007           .000           .000                1.000          1085.583           128.724                .000
      30.10.2013 00:00:00        1466                 .491             .011           .479           .005           .000           .000                1.000           976.001            88.744                .000
      01.11.2013 00:00:00         542                 .798             .023           .760           .025           .000           .000                1.000           896.978           114.196                .000
      03.11.2013 00:00:00         544                 .750             .021           .719           .017           .000           .000                1.000          1098.213           134.941                .000

    2 28.10.2013 00:00:00        1638                 .498             .017           .474           .013           .000           .000                1.001           953.514            96.287                .000
      30.10.2013 00:00:00        1014                 .745             .022           .712           .019           .000           .000                1.000          1034.249           131.057                .000
      01.11.2013 00:00:00        1904                 .633             .011           .624           .002           .000           .000                1.000          1045.668           104.568                .000
      03.11.2013 00:00:00         810                 .602             .017           .581           .010           .000           .000                1.000           929.778           108.998                .000


8 rows selected.

9 Responses to “Mining the AWR to Identify Performance Trends”

  • Hi,

    Thanks for sharing scripts. As you said AWR/ASH is a gold mine of data and I think still not all cases are covered by Grid Control Performance Page / ASH analytic. Kyle Hailey start ashmasters repository on github – https://github.com/khailey/ashmasters and I forked it (https://github.com/pioro/ashmasters) and added some scripts I’m using for AWR as well. I’m wondering if creating one repo where people can add / maintain ASH/AWR scripts wouldn’t be a good idea.

    regards,
    Marcin

    • Maris Elsins says:

      Hi Marcin,

      I think the idea of a single ASH/AWR repo for scripts is great, but I’m not sure how to implement it. I’ve just started using GitHub and I’m still not aware of what it’s capable of.

      If it’s a single repo, I would still want to remain in control of my own scripts and if someone else updates them I would like to be asked to review the change before committing it.
      Otherwise it might not take long until I don’t know how to use my own scripts. Can this be done using Github? (This would probably look something like a master repository with pointers to many different scripts from different users.)

      Maris.

  • Connor McDonald says:

    Suggestion to handle multiple instances…eg for wait events:

    set ver off pages 50000 lines 160 tab off
    undef event_name
    undef days_history
    undef interval_hours
    def event_name=”&1″
    def days_history=”&2″
    def interval_hours=”&3″
    col time for a19
    col EVENT_NAME for a64
    col total_waits for 99999999999999
    col total_time_s for 999999999.999
    col avg_time_ms for 999999999.999
    select to_char(time,’DD.MM.YYYY HH24:MI:SS’) time, instance_number inst, event_name, sum(delta_total_waits) total_waits, round(sum(delta_time_waited/1000000),3) total_time_s, round(sum(delta_time_waited)/decode(sum(delta_total_waits),0,null,sum(delta_total_waits))/1000,3) avg_time_ms from
    (select hse.snap_id, hse.instance_number,
    trunc(sysdate-&days_history+1)+trunc((cast(hs.begin_interval_time as date)-(trunc(sysdate-&days_history+1)))*24/(&interval_hours))*(&interval_hours)/24 time,
    EVENT_NAME,
    (lead(TOTAL_WAITS,1) over(partition by hs.STARTUP_TIME, hs.instance_number, EVENT_NAME order by hse.snap_id))-TOTAL_WAITS delta_total_waits,
    (lead(TIME_WAITED_MICRO,1) over(partition by hs.STARTUP_TIME, hs.instance_number, EVENT_NAME order by hse.snap_id))-TIME_WAITED_MICRO delta_time_waited
    from DBA_HIST_SYSTEM_EVENT hse, DBA_HIST_SNAPSHOT hs
    where hse.snap_id=hs.snap_id
    and hse.instance_number = hs.instance_number
    and hs.begin_interval_time>=trunc(sysdate)-&days_history+1
    and hse.EVENT_NAME like ‘&event_name’)
    group by time, event_name, instance_number
    order by 2, 1;

    Not thoroughly checked, but hopefully you get the idea..

    Cheers
    Connor

  • Anand says:

    Thanks for sharing. Its very useful.

  • Bert Scalzo says:

    Useful scripts – just one caveat. Accessing DBA_HIST views means you must pay for tuning and diagnostics packs. The data and mining it is great – just there is a cost associated that many might not realize.

  • Nice and a very handy scripts to check the SQL’s trend. Thanks for sharing…

  • Hendra says:

    very nice share, useful scripts.

    Many Thanks,

    Hendra

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>