Performance Problems with Dynamic Statistics in Oracle 12c

Posted in: Technical Track

I’ve been making some tests recently with the new Oracle 12.1.0.2 In-Memory option and have been faced with an unexpected  performance problem.  Here is a test case:

create table tst_1 as
with q as (select 1 from dual connect by level <= 100000)
select rownum id, 12345 val, mod(rownum,1000) ref_id  from q,q
where rownum <= 200000000;

Table created.

create table tst_2 as select rownum ref_id, lpad(rownum,10, 'a') name, rownum || 'a' name2</pre>
from dual connect by level <= 1000;

Table created.

begin
dbms_stats.gather_table_stats(
ownname          => user,
tabname          =>'TST_1',
method_opt       => 'for all columns size 1',
degree => 8
);
dbms_stats.gather_table_stats(
ownname          => user,
tabname          =>'TST_2',
method_opt       => 'for all columns size 1'
);
end;
/
PL/SQL procedure successfully completed.

alter table tst_1 inmemory;

Table altered.

select count(*) from tst_1;

COUNT(*)
----------
200000000

Waiting for in-memory segment population:

select segment_name, bytes, inmemory_size from v$im_segments;

SEGMENT_NAME         BYTES INMEMORY_SIZE

--------------- ---------- -------------

TST_1           4629463040    3533963264

Now let’s make a simple two table join:

select name, sum(val) from tst_1 a, tst_2 b where a.ref_id = b.ref_id and name2='50a'
group by name;

Elapsed: 00:00:00.17

Query runs pretty fast. Execution plan has the brand new vector transformation

Execution Plan
----------------------------------------------------------
Plan hash value: 213128033

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                          |     1 |    54 |  7756  (21)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION        |                          |       |       |            |          |
|   2 |   LOAD AS SELECT                  | SYS_TEMP_0FD9D66FA_57B2B |       |       |            |          |
|   3 |    VECTOR GROUP BY                |                          |     1 |    24 |     5  (20)| 00:00:01 |
|   4 |     KEY VECTOR CREATE BUFFERED    | :KV0000                  |     1 |    24 |     5  (20)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL            | TST_2                    |     1 |    20 |     4   (0)| 00:00:01 |
|   6 |   HASH GROUP BY                   |                          |     1 |    54 |  7751  (21)| 00:00:01 |
|*  7 |    HASH JOIN                      |                          |     1 |    54 |  7750  (21)| 00:00:01 |
|   8 |     VIEW                          | VW_VT_377C5901           |     1 |    30 |  7748  (21)| 00:00:01 |
|   9 |      VECTOR GROUP BY              |                          |     1 |    13 |  7748  (21)| 00:00:01 |
|  10 |       HASH GROUP BY               |                          |     1 |    13 |  7748  (21)| 00:00:01 |
|  11 |        KEY VECTOR USE             | :KV0000                  |   200K|  2539K|  7748  (21)| 00:00:01 |
|* 12 |         TABLE ACCESS INMEMORY FULL| TST_1                    |   200M|  1716M|  7697  (21)| 00:00:01 |
|  13 |     TABLE ACCESS FULL             | SYS_TEMP_0FD9D66FA_57B2B |     1 |    24 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("NAME2"='50a')
   7 - access("ITEM_5"=INTERNAL_FUNCTION("C0") AND "ITEM_6"="C2")
  12 - inmemory(SYS_OP_KEY_VECTOR_FILTER("A"."REF_ID",:KV0000))
       filter(SYS_OP_KEY_VECTOR_FILTER("A"."REF_ID",:KV0000))

Note
-----
   - vector transformation used for this statement

After having such impressive performance I’ve decided to run the query in parallel:

select /*+ parallel(8) */ name, sum(val) from tst_1 a, tst_2 b
where a.ref_id = b.ref_id and name2='50a'
group by name;

Elapsed: 00:01:02.55

Query elapsed time suddenly dropped from 0.17 seconds to the almost 1 minute and 3 seconds. But the second execution runs in 0.6 seconds.
The new plan is:

Execution Plan
----------------------------------------------------------
Plan hash value: 3623951262

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |    29 |  1143  (26)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                     |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)               | :TQ10001 |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                    |          |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                      |          |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH                   | :TQ10000 |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY                 |          |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |        HASH JOIN                    |          |   200K|  5664K|  1142  (26)| 00:00:01 |  Q1,00 | PCWP |            |
|   8 |         JOIN FILTER CREATE          | :BF0000  |     1 |    20 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  9 |          TABLE ACCESS FULL          | TST_2    |     1 |    20 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |         JOIN FILTER USE             | :BF0000  |   200M|  1716M|  1069  (21)| 00:00:01 |  Q1,00 | PCWP |            |
|  11 |          PX BLOCK ITERATOR          |          |   200M|  1716M|  1069  (21)| 00:00:01 |  Q1,00 | PCWC |            |
|* 12 |           TABLE ACCESS INMEMORY FULL| TST_1    |   200M|  1716M|  1069  (21)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("A"."REF_ID"="B"."REF_ID")
   9 - filter("NAME2"='50a')
  12 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"A"."REF_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"."REF_ID"))

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 8 because of hint

We can see a Bloom filter instead of key vector, but this is not the issue. Problem is coming from the “dynamic statistics used: dynamic sampling (level=AUTO)” note.
In 10046 trace file I’ve found nine dynamic sampling queries and one of them was this one:

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
  optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
  */ SUM(C1)
FROM
 (SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT /*+
  NO_VECTOR_TRANSFORM ORDERED */ "A"."VAL" "ITEM_1","A"."REF_ID" "ITEM_2"
  FROM "TST_1" "A") "VW_VTN_377C5901#0", (SELECT /*+ NO_VECTOR_TRANSFORM
  ORDERED */ "B"."NAME" "ITEM_3","B"."REF_ID" "ITEM_4" FROM "TST_2" "B" WHERE
  "B"."NAME2"='50a') "VW_VTN_EE607F02#1" WHERE ("VW_VTN_377C5901#0"."ITEM_2"=
  "VW_VTN_EE607F02#1"."ITEM_4")) innerQuery

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     43.92      76.33          0          5          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     43.92      76.33          0          5          0           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  RESULT CACHE  56bn7fg7qvrrw1w8cmanyn3mxr (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   SORT AGGREGATE (cr=0 pr=0 pw=0 time=8 us)
         0          0          0    HASH JOIN  (cr=0 pr=0 pw=0 time=4 us cost=159242 size=2600000 card=200000)
 200000000  200000000  200000000     TABLE ACCESS INMEMORY FULL TST_1 (cr=3 pr=0 pw=0 time=53944537 us cost=7132 size=800000000 card=200000000)
         0          0          0     TABLE ACCESS FULL TST_2 (cr=0 pr=0 pw=0 time=3 us cost=4 size=9 card=1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  asynch descriptor resize                        1        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  CSS initialization                              1        0.00          0.00
  CSS operation: action                           1        0.00          0.00
  direct path write temp                       6267        0.02         30.37
********************************************************************************

Vector transformation is disabled, inefficient table order is fixed by the ORDERING hint and we are waiting for hash table creation based on huge TST_1 table.
Dynamic statistics feature has been greatly improved in Oracle 12c  with the support for joins and group by predicates. This is why we have such join during the parse time. Next document has the”Dynamic Statistics (previously known as dynamic sampling)” section inside: Understanding Optimizer Statistics with Oracle Database 12c where the new functionality is described.

Let’s make a simpler test:

select /*+ parallel(2) */ ref_id, sum(val) from tst_1 a group by ref_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 2527371111

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |  1000 |  9000 |  7949  (58)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                   |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                  |          |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                    |          |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH                 | :TQ10000 |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY               |          |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR          |          |   200M|  1716M|  4276  (21)| 00:00:01 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS INMEMORY FULL| TST_1    |   200M|  1716M|  4276  (21)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of hint

We can see a “dynamic statistics used” note again. It’s a simple query without predicates with the single table with pretty accurate statistics. From my point of view, here is no reason for dynamic sampling at all.
Automatic dynamic sampling was introduced in 11G Release 2. Description of this feature can be found in this document: Dynamic sampling and its impact on the Optimizer.
“From Oracle Database 11g Release 2 onwards the optimizer will automatically decide if dynamic sampling will be useful and what dynamic sampling level will be used for SQL statements executed in parallel. This decision is based on size of the tables in the statement and the complexity of the predicates”.
Looks like algorithm has been changed in 12c and dynamic sampling is triggered in a broader set of use cases.
This behavior can be disabled at statement, session or system level using the fix control for the bug 7452863. For example,
ALTER SESSION SET “_fix_control”=’7452863:0′;

Summary

Dynamic statistics has been enhanced in Oracle 12c, but this can lead to a longer parse time.
Automatic dynamic statistics is used more often in 12c which can lead to a parse time increase in the more cases than before.

email

Interested in working with Vyacheslav? Schedule a tech call.

6 Comments. Leave new

Thanks, Slava!

We have been fighting this behavior too: parsing the query can take literally tens time more than running it. (real application queries using inmemory partitioned tables in parallel)

Do you think this has already been reported as a bug?

Reply
Vyacheslav Rasskazov
January 18, 2015 9:06 pm

Hey, Ivan!

I did not find any information about such dramatic changes in automatic dynamic sampling behavior at 12c. At 11.2 it was possible to mitigate impact of this feature by different statistics gathering methods. At MOS I can see only the old document: Different Level for Dynamic Statistics (Dynamic Sampling) used than the Level Specified (Doc ID 1102413.1). It’s very common to have relatively high number of hard parses in DWH environment since a lot of queries are dynamically generating by reporting/analytic tools. And these DWH queries can run pretty fast with In-Memory option or with cell offloading on Exadata. Maybe I’m too categorical, but from my point of view, “_fix_control”=’7452863:0′ might be a good option, especially for DWH on 12c.

Reply

I found another quirk in DS level 11 (AUTO) behavior:

DS issues behind the scene queries that utilize SAMPLE SQL clause in order to get the statistics it needs . DS can, however, issue the very same internal query multiple times with different SAMPLE values. As expected, the cost of DS in those cases is quite high.

More details can be found here:
https://iiotzov.wordpress.com/2014/01/29/more-on-dynamic-sampling-level-11-auto-in-oracle-12c/

Reply

you might want to add a few more fixes in the fix control. 20636003 , 12914055

Reply
Vyacheslav Rasskazov
February 8, 2016 7:51 pm

Thanks, Abel!

Reply

Hi – we have seen a similar issue in latest upgrade we just did. The feature is called adaptive execution plan in Oracle 12C. We had to disable this feature to ensure consistency of execution plan for the same query.

I am no DBA but i know how we did it, so shoot me any questions you might have on this feature.

Regards,
Jinay

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *