Traditional methods would include looking at CPU, load and memory statistics from sar or other sources. Perhaps a better method however is just to let the Oracle database tell you where to concentrate your efforts. The data is all available in AWR; all that is necessary is to go find it.
The gold standard for determining database activity is the Average Active Session (AAS), a metric that appeared when Oracle introduced ASH and AWR as performance monitoring tools. AAS measures just how active the database is, and is defined as the "number of sessions that are either working or waiting" at a point in time.
Armed with that information you can now dig in to AWR and find the most active times from an AAS perspective. Initial efforts to find the top 10 most active periods in a database looked like this, where the following SQL fragment is used to rank AWR snapshots by the maximum AAS value.
--aas-1.sql with aas as ( select snap_id, value, dbid, instance_number, rownum my_rownum from ( select distinct snap_id, dbid, instance_number, max(value) over (partition by snap_id order by snap_id) value from dba_hist_sysmetric_history where metric_name = 'Average Active Sessions' order by value desc ) ), top10aas as ( select sn.begin_interval_time begin_time, aas.instance_number, aas.snap_id, aas.value, aas.dbid from aas join dba_hist_snapshot sn on sn.snap_id = aas.snap_id and sn.instance_number = aas.instance_number where my_rownum <= 10 order by value desc ) select * from top10aas
That seemed to work well until some apparently spurious values showed up. A server having 32 cores was seen to have an AAS of 3000+ during a period of time in one AWR snapshot. As there was no other evidence (CPU, Memory, IO) to support this, it seemed it was likely in error.
So rather than rely on AAS, a different metric was chosen. The number of sessions per snapshot that are on CPU, or waiting on some non-idle event are now being counted as DB Time. The ratio of $db\_time : elapsed\_time$ is now being considered to determine the most active periods.
--aas-2.sql with aas as ( select snap_id, value, dbid, instance_number, rownum my_rownum from ( select snap_id, round(db_time / elapsed_time,1) value, dbid, instance_number from ( select distinct h.snap_id, h.instance_number, h.dbid, count(*) over (partition by h.snap_id, h.instance_number) * 10 db_time, (extract( day from (s.end_interval_time - s.begin_interval_time) )*24*60*60)+ (extract( hour from (s.end_interval_time - s.begin_interval_time) )*60*60)+ (extract( minute from (s.end_interval_time - s.begin_interval_time) )*60)+ (extract( second from (s.end_interval_time - s.begin_interval_time))) elapsed_time from dba_hist_active_sess_history h join dba_hist_snapshot s on s.snap_id = h.snap_id and s.instance_number = h.instance_number where (wait_class is null or wait_class != 'Idle') ) order by 2 desc ) ), top10aas as ( select sn.begin_interval_time begin_time, aas.instance_number, aas.snap_id, aas.value, aas.dbid from aas join dba_hist_snapshot sn on sn.snap_id = aas.snap_id and sn.instance_number = aas.instance_number where my_rownum <= 10 order by value desc ) select * from top10aas
In hindsight, it may be a good idea to rename sub-factored queries from top10aas to topactivity. On the other hand, the methods can be switched back and forth without affecting any dependent scripts.
Here are examples from an active test database. The first example is using the query on dba_hist_sysmetric_history:
SQL> @aas-1 BEGIN_TIME INSTANCE_NUMBER SNAP_ID VALUE DBID ------------------------------ --------------- ---------- ---------- ---------- 15-JUL-16 09.00.43.624 AM 1 18087 294.076413 4107786006 21-JUL-16 11.00.59.752 AM 1 18233 126.210386 4107786006 09-AUG-16 05.00.40.556 AM 1 18683 77.0418398 4107786006 ...
Next are the results of the query that is considering DB Time:
SQL> @aas-2 BEGIN_TIME INSTANCE_NUMBER SNAP_ID VALUE DBID ------------------------------ --------------- ---------- ---------- ---------- 12-AUG-16 01.00.52.943 AM 1 18751 27.7 4107786006 28-JUL-16 01.00.28.734 AM 1 18391 26.4 4107786006 05-AUG-16 01.00.43.076 AM 1 18583 25.4 4107786006 ...
You probably noticed there is not a 1:1 match of periods chosen between these two methods. Due to the difference in these metrics, there will never be a 1:1 match, however both methods are valid and will find periods of high activity.
Now that you can identify the top 10 activity periods, what is the next step? You might run AWR/ADDM reports, but you must first ensure the snapshots are not deleted by the retention policy.
The gold standard here is to create an AWR Baseline for each. The following SQL fragment is based on aas-2.sql to identify those snapshots:
with aas as ( select begin_snap_id, end_snap_id, value, dbid, instance_number, rownum my_rownum from ( select begin_snap_id, lead(begin_snap_id,1) over (partition by dbid, instance_number order by begin_snap_id) end_snap_id, round(db_time / elapsed_time,1) value, dbid, instance_number from ( select distinct h.snap_id begin_snap_id, h.instance_number, h.dbid, count(*) over (partition by h.snap_id, h.instance_number) * 10 db_time, (extract( day from (s.end_interval_time - s.begin_interval_time) )*24*60*60)+... from dba_hist_active_sess_history h join dba_hist_snapshot s on s.snap_id = h.snap_id and s.instance_number = h.instance_number and (h.wait_class is null or h.wait_class != 'Idle') and s.end_interval_time > systimestamp - 5 ) order by value desc ) ), top10 as ( select sn.begin_interval_time begin_time, aas.instance_number, aas.begin_snap_id, aas.end_snap_id, aas.value, aas.dbid from aas join dba_hist_snapshot sn on sn.snap_id = aas.begin_snap_id and sn.dbid = aas.dbid and sn.instance_number = aas.instance_number where my_rownum <= 10 order by value desc )
Using the top10 logic above, we can now run create-baselines.sql. This script uses dbms_workload_repository.create_baseline to automatically preserve the data.
set serveroutput on size unlimited var n_expire_days number exec :n_expire_days := 1 -- change to something sensible declare v_baseline_pfx varchar2(30) := 'dw'; v_baseline_name varchar2(128); i_expire_days integer := :n_expire_days; e_baseline_exists exception; pragma exception_init(e_baseline_exists, -13528); begin for aasrec in ( @@top10 select begin_time, instance_number, begin_snap_id, end_snap_id, value, dbid from top10 ) loop begin v_baseline_name := v_baseline_pfx || '_' || to_char(aasrec.begin_snap_id) || '_' || to_char(aasrec.begin_time,'yyyymmdd-hh24:mi:ss'); dbms_workload_repository.create_baseline( start_snap_id => aasrec.begin_snap_id, end_snap_id => aasrec.end_snap_id, baseline_name => v_baseline_name, dbid => aasrec.dbid, expiration => i_expire_days ); exception when e_baseline_exists then dbms_output.put_line('!!Baseline ' || v_baseline_name || ' already exists'); end; end loop; end; /
Once the procedure completes, you can confirm the preserved snapshots in dba_hist_baseline:
SQL> @show-awr-baselines BASELINE_NAME BASELINE TYPE START_SNAP_ID START_SNAP_TIME EXPIRATION --------------------------- ------------- ------------- -------------------- -------------------- dw_31888_20160723-08:00:10 STATIC 31888 2016-07-23 09:00:24 2016-08-31 12:45:03 dw_31946_20160725-18:00:12 STATIC 31946 2016-07-25 19:00:26 2016-08-31 12:45:03 ...
The AWR snapshots preserved by the new baselines will not be aged out as per standard AWR retention policies. This allows ample time to run reports and drill down on the root cause of performance issues. Code from this article may also be found at https://github.com/jkstill/oracle-baselines.
Ready to optimize your Oracle Database for the future?