Oracle Database Consulting Services
Ready to optimize your Oracle Database for the future?
This post is about a Statspack anomaly I discovered while analyzing execution trends of a particular SQL on one of our clients’ databases. I decided to share it, since its appearance can be misleading. The database in question was an Oracle EE version 11.2.0.3. However, as Statspack is no longer maintained, the same problem can be reproduced on 19c. In this post, I’ll also verify whether the same problem shows up in AWR (spoiler: it doesn’t).
Mining the Statspack repository for a certain SQL_ID using this simple script revealed a sharp increase in its number of executions —a SQL which is usually executed once or twice per snapshot and has apparently been executed 409 times:
SNAP_ID EXECUTIONS ELAPSED_TIME_SEC_TOTAL DISK_READS_PER_EXEC BUFFER_GETS_PER_EXEC ROWS_PROCESSED_PER_EXEC -------- ---------- ---------------------- ------------------- -------------------- ----------------------- 31451 2 1126.48 1767896 1768738 10904478 31452 2 1579.11 1760511.5 1761343.5 10904478 31453 1 498.76 1767896 1758346 10904478 31455 409 481113.8 1752410.67 2256430.3 10886122.5
A proper Statspack report reported the same numbers. Cross-checking with the application side, there were indications that this couldn’t be true.
As Statspack’s code is available in $ORACLE_HOME/rdbms/admin, we can easily verify how snapshots are created (spcpkg.sql) and also how reports are generated (sprepins.sql).
The PL/SQL procedure “SNAP_SQL” located in STATSPACK’s package body contains the INSERT statement which captures SQL execution statistics. It keeps only the relevant parts for this investigation:
insert into stats$sql_summary
select
l_snap_id
, p_dbid
, p_instance_number
, max(substrb(sql_text,1,31)) text_subset
, max(sql_id) sql_id
.
.
, sum(executions) executions
, sum(px_servers_executions) px_servers_executions
.
.
from v$sql sql
where is_obsolete = 'N'
and sql_id in (select sql_id
from stats$v$sqlstats_summary sqlstats
where ( buffer_gets > l_buffer_gets_th
or disk_reads > l_disk_reads_th
or parse_calls > l_parse_calls_th
or executions > l_executions_th
or sharable_mem > l_sharable_mem_th
or version_count > l_version_count_th
)
)
group by old_hash_value, address;
Notice that the SQL aggregates and stores execution statistics for a particular by “old_hash_value, address”:
Since the “group by” doesn’t also include the CHILD_ADDRESS column, the SQL execution statistics are aggregated across all of the child cursors. It’s also worth noting that the values in stats$sql_summary are cumulative. This means we have to calculate the difference between two snapshots to obtain the actual delta for a particular execution statistic.
Report generation
This brings us to the code responsible for calculating the number of executions between two snapshots in a Statspack report, available in $ORACLE_HOME/rdbms/admin/sprepins.sql. Here again, is only the relevant part:
.
.
select
old_hash_value
, text_subset
, sum(case
when snap_id = &&begin_snap and prev_snap_id = -1
then 0
else
case when (address != prev_address)
or (executions < prev_executions)
then executions
else executions - prev_executions
end
end) delta_executions
.
.
) delta_rows_processed
from (select snap_id
, old_hash_value
, text_subset
.
.
, executions
,(lag(executions, 1, 0)
over (partition by old_hash_value
, dbid
, instance_number
order by snap_id)) prev_executions
) prev_rows_processed
from stats$sql_summary s
where s.snap_id between &&begin_snap and &&end_snap
and s.dbid = &&dbid
and s.instance_number = &&inst_num
)
group by old_hash_value
, text_subset
, module
The relevant part for us is the logic in the ELSE part of the CASE statement. It returns the “executions” value stored in stats$sql_summary when:
In all other cases the “executions – prev_executions” figure is reported, which means we simply get a delta between two snapshots. This is the most common scenario.
The above logic works until there’s a scenario like this:
And this is exactly what happened on the database I was analyzing. Checking the contents of stats$sql_summary for this particular SQL, we can see that SNAP_ID=31454 reports 410 executions, however the next snapshot reports 409 executions:
SQL> select snap_id, executions from stats$sql_summary where sql_id = '1v5wcft51pmy1' order by snap_id;
SNAP_ID EXECUTIONS
---------- ----------
31451 407
31452 409
31453 410
31454 410 <--
31455 409 <--
31456 409
31457 410
Knowing how Statspack collects and reports data from stats$sql_summary, I checked the status in v$sql and there were various child cursors for this particular SQL:
SQL> SELECT child_number, old_hash_value, address, child_address, first_load_time, last_load_time, last_active_time, executions, invalidations, parse_calls
FROM v$sql
WHERE sql_id = '1v5wcft51pmy1'
ORDER BY 1;
CHILD_NUMBER OLD_HASH_VALUE ADDRESS CHILD_ADDRESS FIRST_LOAD_TIME LAST_LOAD_TIME LAST_ACTIVE_TIME EXECUTIONS INVALIDATIONS PARSE_CALLS
------------ -------------- ---------------- ---------------- ------------------- ------------------- -------------------- ---------- ------------- -----------
0 1140771677 000000D13F779218 000000D139974CF8 2021-05-20/23:11:33 2021-09-01/07:21:49 01.sep.2021 21:13:29 29 0 339
1 1140771677 000000D13F779218 000000D139E94B40 2021-05-20/23:11:33 2021-09-01/21:41:37 02.sep.2021 08:03:50 7 0 85
4 1140771677 000000D13F779218 000000D139E0EC88 2021-05-20/23:11:33 2021-08-26/22:11:38 27.aug.2021 09:43:43 10 0 123
7 1140771677 000000D13F779218 000000D139C1B398 2021-05-20/23:11:33 2021-08-28/07:26:53 28.aug.2021 19:43:24 26 0 304
10 1140771677 000000D13F779218 000000D13FAA3E40 2021-05-20/23:11:33 2021-08-29/21:41:34 30.aug.2021 07:25:36 6 0 69
11 1140771677 000000D13F779218 000000D13838C840 2021-05-20/23:11:33 2021-08-30/08:38:21 30.aug.2021 21:44:38 29 0 317
12 1140771677 000000D13F779218 000000D137BDC5E0 2021-05-20/23:11:33 2021-08-31/08:28:23 31.aug.2021 21:13:29 29 0 338
7 rows selected.
My theory about why the execution count decreased from 410 to 409 in stats$sql_summary is that one or more of the child cursors was purged from the library cache between snapshots 31454 and 31455. When taking snapshot 31455, the “sum(executions)” over all child cursors returned 409 and that was stored in stats$sql_summary. A further indication is that the values in the CHILD_NUMBER column have some missing values, meaning child cursors were purged from the library cache. When generating the Statspack report, the “(executions < prev_executions)” condition became true, so the 409 executions were reported.
Note: The values from v$sql displayed above were not taken immediately after SNAP_ID=31455 was taken, so the “sum(executions)” can’t be directly compared. The important takeaway here is the number of child cursors and the missing values in the child_number column.
The main motivation for reproducing the problem is to verify the above theory and perhaps even more importantly, check how AWR behaves in such situations.
The plan to reproduce the issue was relatively simple; create a parent cursor with many child cursors, flush only some of them from the library cache and check what Statspack and AWR report.
Purging only some of the cursors turned out as the challenging part. At first I tried those techniques:
Eventually, I opted for an unscientific approach— generating space pressure on the shared pool by heavily hard parsing and waiting for some of my observed child cursors to be purged out automatically by Oracle.
The script I used to execute the whole test is available here and the full log of the test is here. (Note that if you intend to re-run it, it might need a bit of tweaking depending on the sizes of your SGA and shared pool. The test was executed on a Oracle 19.5 EE running on Linux x64. Also, because of the heavy hard parsing, you should run it in an isolated database.) Finally, the observed SQL was selected from a real table because I wanted to check how other statistics behave. Here’s what we could use to speed up the test:
qtext="select /*"$(date)"*/ 1 from dual"
Description of the most relevant steps in the test script:
CHILD_NUMBER EXECUTIONS
------------ ----------
0 100
1 100
2 100
3 100
.
.
197 100
198 100
199 100
200 rows selected.
CNT SUM(EXECUTIONS)
---------- ---------------
200 20000
At this point there are 200 child cursors, each with 100 executions. Next, a Statspack (SNAP_ID=572) and AWR (SNAP_ID=312) snapshot is created.
5. Flood the shared pool by hard parsing in a loop. This is achieved by generating and executing unique SQLs in a loop. Exit the loop as soon as the number of child cursors for our observed SQL decreases. After that, check the number of child cursors in v$sql:
** Child cursor status from v$sql after the number of child cursors decreased: **
COUNT(*) SUM(EXECUTIONS)
---------- ---------------
41 4100
At this point, 159 child cursors were purged and there are only 41 left in the library cache.
6. Set optimizer_index_cost_adj=1000 (a value which wasn’t used yet) in order to force a new child cursor creation, and execute the observed SQL only once. The number of child cursors increases from 41 to 42 and the total number of executions increases by one to 4101:
** Child cursor status from v$sql: **
COUNT(*) SUM(EXECUTIONS)
---------- ---------------
42 4101
7. Generate another set of Statspack (SNAP_ID = 573) and AWR (SNAP_ID=313) snapshots.
Note: On some occasions, the act of creating Statspack and AWR snapshots purges some of the observed child cursors out of the library cache. We need to check the figures reported in the Statspack and AWR reports so they match with the above output, otherwise the test needs to be repeated.
Finally, we can verify the data captured in AWR and Statspack repositories.
SNAP_ID EXECUTIONS
---------- ----------
572 20000
573 4101
Excerpt from the Statspack report for snapshots 572-573:
CPU CPU per Elapsd Old
Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
16.28 4,101 0.00 366.4 16.32 6,393,462 1915833037
Module: SQL*Plus
select /*Wed Sep 22 17:33:31 CEST 2021*/ max(t_1.object_id) from t1 t_1
Observation: Even though there was only one execution of our observed SQL between snapshots 572 and 573, Statspack reports 4101 executions. This is what most likely happened also on the production database, and we just reproduced it.
AWR doesn’t have the same problem as Statspack, as it correctly reports only one execution for SNAP_ID=313:
SNAP_ID NODE BEGIN_INTERVAL_TIME SQL_ID PLAN_HASH_VALUE EXECS ROWS_PROCESSED ROWS_PROCESSED_PER_EXEC AVG_ETIME AVG_LIO AVG_PIO
---------- ------ ------------------------------ ------------- --------------- ------------ -------------- ----------------------- ------------ -------------- ----------
312 1 22-SEP-21 05.33.32.745 PM g96vqh5wyw7q9 3724264953 20,000 20000 1 .004 1,559.0 0
313 1 22-SEP-21 05.34.51.241 PM g96vqh5wyw7q9 1 1 1 .005 1,562.0 0
The code to capture AWR snapshots isn’t as easily accessible as Statspack’s. Suffice to say that when researching how AWR captures the snapshots while inserting into the WRH$_SQLSTAT table (which is the table underlying the DBA_HIST_SQLSTAT view), it does so by running an INSERT INTO SELECT from X$KGLCURSOR_CHILD_SQLIDPH, which stores data for each child cursor. Therefore, it’s likely that runtime statistics at the child cursor level are taken into account when generating the AWR snapshots.
All of the files generated during this test are uploaded to the GitHub repository.
In cases where only some of the child cursors are purged from library cache, Statspack can report an erroneously high number of executions for a particular SQL. AWR doesn’t seem to suffer from the same problems.
Be aware that in case you try to re-run the test with a higher number of child cursors, you might get “strange” results in AWR. Below is what AWR reports when running the test with 500 child cursors (which is still below the _cursor_obsolete_threshold=1024 limit in 19c). Despite there were 500 child cursors with 50000 executions in v$sql, AWR didn’t capture their runtime statistics:
SQL> SELECT ss.snap_id, begin_interval_time, plan_hash_value, executions_delta, rows_processed_delta, buffer_gets_delta, disk_reads_delta, elapsed_time_delta
FROM dba_hist_sqlstat s, dba_hist_snapshot ss
WHERE sql_id = 'fs76gd8ck6gft'
AND ss.snap_id = S.snap_id
AND ss.instance_number = S.instance_number
ORDER BY 1, 2, 3;
SNAP_ID BEGIN_INTERVAL_TIME PLAN_HASH_VALUE EXECUTIONS_DELTA ROWS_PROCESSED_DELTA BUFFER_GETS_DELTA DISK_READS_DELTA ELAPSED_TIME_DELTA
---------- ---------------------------------------- --------------- ---------------- -------------------- ----------------- ---------------- ------------------
341 24-SEP-21 01.53.41.341 PM 0 199321839
342 24-SEP-21 01.57.03.885 PM 0 6858
This behavior is described in My Oracle Support note: “Why “Executions” and “Elap per Exec(s)” are ‘Blank’ in AWR for Some SQL Statements (Doc ID 1522547.1)”:
“This is expected behavior. Since 10.2, “Executions” and “Elap per Exec(s)” statistics of high version count cursor(> 200 by default) will no longer be collected.”
Ready to optimize your Oracle Database for the future?