DML by Unique Index Supposedly Affecting Many Rows
This blog post is based on a real customer case, and illustrates how v$sql reports execution statistics for certain DML types of operations (I purposely have not named which DML types … yet). These reports can be misleading when troubleshooting a performance problem. The goal of this post is to hopefully save you some time in case you encounter a similar case.
First I’ll present some observations from a long-running DELETE statement that’s in progress. Some of you will probably immediately recognize the cause of this behavior, however I’ll also outline the troubleshooting steps based on those observations. I’ll present the root cause at the end of the post.
Scenario description
There was a long-running DELETE statement on a 12.1 database. It was a two-node RAC, but that isn’t relevant in this case. When I checked what the session was doing, I could see it was mainly waiting on “db file sequential read” waits. This could be due to various reasons. Its SQL text and execution plan showed it was executing a DELETE by primary key:
DELETE FROM SOMETBLE WHERE SOMETBLE_ID = :B2 Plan hash value: 3523556778 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | | | 1 (100)| | | 1 | DELETE | SOMETBLE | | | | | |* 2 | INDEX UNIQUE SCAN| PK_SOMETBL | 1 | 193 | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("SOMETBLE_ID"=:B2)
If we check the DELETE’s statistics to get an insight into the amount of data it processed so far, we can see:
SQL> select child_number, plan_hash_value, executions, round(elapsed_time/executions/1e6,2) elapsed_time_per_exec, round(cpu_time/elapsed_time*100,2) "cpu_time_%", round(concurrency_wait_time/elapsed_time*100,2) "concurrency_wait_time_%", round(cluster_wait_time/elapsed_time*100,2) "cluster_wait_time_%", round(user_io_wait_time/elapsed_time*100,2) "user_io_wait_time_%", round(rows_processed/executions,2) rows_processed_per_execution, round(buffer_gets/executions,2) buffer_gets_per_execution from v$sql where sql_id = '6xs07f3vucupq' order by 3 desc; CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS ELAPSED_TIME_PER_EXEC CPU_TIME_% CONCURRENCY_WAIT_TIME_% CLUSTER_WAIT_TIME_% USER_IO_WAIT_TIME_% ROWS_PROCESSED_PER_EXECUTION BUFFER_GETS_PER_EXECUTION ------------ --------------- ---------- --------------------- ---------- ----------------------- ------------------- ------------------- ---------------------------- ------------------------- 0 3523556778 26 323.34 9.43 0 1.11 92.21 75982.46 5987362.08 SQL>
Because the values v$sql reports are cumulative, it’s necessary to calculate deltas between two points in time to obtain the difference in values during this time period. The above is a snapshot I gathered when the problem was occurring, and the only figure increasing between consecutive runs was the increasing number of buffer gets (last column in the output). At the time it was enough to continue with the investigation.
Side note: script to obtain deltas from v$sqlstats without any supporting PL/SQL objects
The above prompted me to adapt Tanel Poder’s topsql.sql script using a technique described in this Jonathan Lewis blog post. The modified script is available here. The main modification is that the script doesn’t require a supporting PL/SQL package. This is useful as some clients (like the one I was working for in this case) don’t allow any modifications on their databases. Using the modified script on my sandbox DB when reproducing the above problem, I could easily get the deltas for the DELETE:
SQL> @topsql buffer_gets/decode(executions,0,1,executions),rows_processed/decode(executions,0,1,executions),executions 10 BUFFER_GETS/DECODE(EXECUTIONS,0,1,EXECUTIONS) ROWS_PROCESSED/DECODE(EXECUTIONS,0,1,EXECUTIONS) EXECUTIONS SQL_ID PLAN_HASH_VALUE SQL_TEXT --------------------------------------------- ------------------------------------------------ ---------- ------------- --------------- ------------------------------ 6226002 0 0 fwgw3xdy86nfy 0 BEGIN test_pkg.purge_t1; END; 6226002 0 0 0abrc1f5qqyza 1101221491 DELETE FROM T1 WHERE rn=:B2
Going back to the figures from the customer’s environment
The DELETE was still running, and since the cursor was loaded, there were 26 executions which on average deleted 75982 rows per execution. How is that possible, since the DELETE is accessing data by primary key? Shouldn’t the number of executions be >= rows_processed? Additionally, the number of buffer gets per execution (45.7GB) also doesn’t seem “reasonable” for deleting one row from a table. Given the information collected so far, and assuming for a moment that the v$sql.executions figure might for some reason not be reported correctly, I continued with the investigation.
It’s possible there was some recursive SQL causing those buffer visits. This could be related to triggers or foreign key validation, but the table didn’t have any of those defined. The table had 80 columns, none of which was an LOB (large object). It had 25 indexes, many of them multi-column ones, so perhaps something odd was going on with managing the indexes (although that still doesn’t explain why executions >= rows_processed). Enabling the 10046 trace on this session, there were only “db file sequential read” waits related to various objects (obj#) indicating single block reads. There were no other types of calls (EXEC, FETCH, BINDS), or any recursive SQL:
WAIT #140386332697512: nam='db file sequential read' ela= 509 file#=25 block#=1892854 blocks=1 obj#=227531 tim=5499116275505 WAIT #140386332697512: nam='db file sequential read' ela= 457 file#=26 block#=1961814 blocks=1 obj#=227532 tim=5499116276005 WAIT #140386332697512: nam='db file sequential read' ela= 186 file#=25 block#=1983830 blocks=1 obj#=227533 tim=5499116276235 WAIT #140386332697512: nam='db file sequential read' ela= 569 file#=25 block#=2011326 blocks=1 obj#=227534 tim=5499116276837 WAIT #140386332697512: nam='db file sequential read' ela= 940 file#=26 block#=2030996 blocks=1 obj#=227535 tim=5499116277849 WAIT #140386332697512: nam='db file sequential read' ela= 418 file#=26 block#=2098394 blocks=1 obj#=227537 tim=5499116278308 WAIT #140386332697512: nam='db file sequential read' ela= 474 file#=26 block#=2148990 blocks=1 obj#=227539 tim=5499116278867 WAIT #140386332697512: nam='db file sequential read' ela= 242 file#=25 block#=2212213 blocks=1 obj#=227541 tim=5499116279149 WAIT #140386332697512: nam='db file sequential read' ela= 586 file#=26 block#=2243375 blocks=1 obj#=227543 tim=5499116279766 WAIT #140386332697512: nam='db file sequential read' ela= 474 file#=26 block#=2275160 blocks=1 obj#=227544 tim=5499116280304 WAIT #140386332697512: nam='db file sequential read' ela= 614 file#=25 block#=2285500 blocks=1 obj#=227545 tim=5499116280977 WAIT #140386332697512: nam='db file sequential read' ela= 995 file#=25 block#=2322697 blocks=1 obj#=227546 tim=5499116282044 WAIT #140386332697512: nam='db file sequential read' ela= 432 file#=26 block#=2356469 blocks=1 obj#=227547 tim=5499116282504 WAIT #140386332697512: nam='db file sequential read' ela= 493 file#=25 block#=2357897 blocks=1 obj#=227548 tim=5499116283060
Running a quick summary on the trace file, and not even filtering on “db file sequential read,” reported various objects:
$ grep "obj#=" db_j001_4384.trc | cut -f12 -d' ' | sort | uniq -c 1378 obj#=224155 516 obj#=227528 662 obj#=227529 395 obj#=227530 742 obj#=227531 448 obj#=227532 733 obj#=227533 402 obj#=227534 713 obj#=227535 495 obj#=227536 504 obj#=227537 480 obj#=227538 923 obj#=227539 881 obj#=227540 444 obj#=227541 605 obj#=227543 431 obj#=227544 641 obj#=227545 403 obj#=227546 273 obj#=227547 289 obj#=227548 651 obj#=227549 301 obj#=230064 295 obj#=230065
Apart from the table segment, the above object IDs belonged to indexes defined on this table (names redacted):
SELECT owner, object_name, object_type FROM dba_objects WHERE object_id IN ( 224155, 227528, 227532, 227536, 227537, 227533, 227534, 227535, 227537, 227540, 227543, 227544, 227545, 227546, 227549, 230064, 224155, 227528, 227529 ); OWNER OBJECT_NAME OBJECT_TYPE ------------ -------------- ----------------------- SOMESCHEMA T1 TABLE SOMESCHEMA I1 INDEX SOMESCHEMA I2 INDEX SOMESCHEMA I3 INDEX SOMESCHEMA I4 INDEX SOMESCHEMA I5 INDEX SOMESCHEMA I6 INDEX SOMESCHEMA I7 INDEX SOMESCHEMA I8 INDEX SOMESCHEMA I9 INDEX SOMESCHEMA I10 INDEX SOMESCHEMA I12 INDEX SOMESCHEMA I13 INDEX SOMESCHEMA I14 INDEX SOMESCHEMA I15 INDEX SOMESCHEMA I16 INDEX 16 rows selected.
Reason for this behavior
The values in v$session’s PLSQL_ENTRY_OBJECT_ID and PLSQL_ENTRY_SUBPROGRAM_ID columns pointed to a PL/SQL packaged procedure where the DELETE was defined. This code segment is responsible for executing the DELETE:
forall i in istart..istop save exceptions EXECUTE IMMEDIATE 'DELETE FROM '||p_table_name||' WHERE '||p_column_name||' = :B2' USING p_arrIDs(i);
We can see that the reason for this behavior is bulk processing, where one execution is reported for the whole array of processed values. Reproducing the above scenario on my sandbox DB confirmed it. I also observed — and this might be relevant if you encounter a similar scenario — that if you enable 10046 tracing while the DELETE is already running, only WAIT calls will be output to the trace file. Let me show you what happens if you enable tracing before the DELETE starts, or when the DELETE is executed in a loop e.g. using this sample code:
create or replace package body test_pkg is procedure purge_t1 is type t_idarray is table of number; ids t_idarray; cursor c1 is select rn from t1; begin open c1; loop fetch c1 bulk collect into ids limit 100000; exit when ids.count = 0; forall i in 1 .. ids.count execute immediate 'DELETE FROM T1 WHERE rn=:B2' using ids(i); end loop; end; end; /
The trace file reports additional recursive SQL and BIND values since the DELETE is started on the next loop. For example:
PARSING IN CURSOR #140712606682816 len=27 dep=1 uid=103 oct=7 lid=103 tim=3051559302 hv=2339077098 ad='64a47aa0' sqlid='0abrc1f5qqyza' DELETE FROM T1 WHERE rn=:B2 END OF STMT . . BINDS #140712606682816: Bind#0 oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00 oacflg=13 fl2=204001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=7ffa350d1cf0 bln=22 avl=04 flg=09 value=200066 BINDS #140712606682816: Bind#0 oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00 oacflg=13 fl2=204001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=7ffa350d1d18 bln=22 avl=04 flg=09 value=200067
Final note
I looked for articles describing the above observations and found one from Jonathan Lewis describing a very similar scenario when analyzing a bulk UPDATE statement: https://jonathanlewis.wordpress.com/2017/03/21/deception/.
There is also a similar observation in this Connor McDonald post when he analyzed a bulk INSERT statement behavior with tkprof:
https://blogs.oracle.com/oraclemagazine/a-fresh-look-at-auditing-row-changes.
I hope you found this post useful. If you have any questions, or thoughts, please feel free to leave them in the comments!
Share this
You May Also Like
These Related Stories
No Comments Yet
Let us know what you think