Pythian Blog: Technical Track

DML Part of a Parallel INSERT Suddenly Started to Execute Serially in Oracle

This post is about investigating a non obvious reason an INSERT running in parallel suddenly started taking longer to execute on an Oracle 19.24 single-instance database. Its execution plan “Note” section reported "PDML disabled because function is not pure and not declared parallel enabled". This caused the DML part of the statement to run serially, slowing it down significantly. 

Reproducing the problem

Let's proceed to set up a test case which reproduces the issue. We'll need three tables - two will be used as the source, and one as the INSERT's target:

create table src_t1 as select object_id, object_name from dba_objects where object_id is not null;
create table src_t2 as select object_id, owner, object_id col1, object_id col2 from dba_objects where object_id is not null;
create table trg as select object_id, object_name from dba_objects where 1=0;

-- Adding primary keys to the tables
alter table src_t1 add constraint src_t1_pk primary key (object_id);
alter table src_t2 add constraint src_t2_pk primary key (object_id);
alter table trg add constraint trg_pk primary key (object_id);

I'll set the parallelism at the table level to avoid using PARALLEL hints in the INSERT and SELECT part of the SQL (but using hints also works well):

alter table src_t1 parallel 2;
alter table src_t2 parallel 2;
alter table trg parallel 2;

Enable parallel DML at the session level:

alter session enable parallel dml;

Normal execution

Now let's test the parallel INSERT, which behaves as expected:

SQL> insert /*+APPEND*/ into trg
select src_t1.object_id, src_t1.object_name
  from src_t1, src_t2
 where src_t1.object_id = src_t2.object_id;
  
24146 rows created.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  cf5guzbqwfv41, child number 0
-------------------------------------
insert /*+APPEND*/ into trg select src_t1.object_id, src_t1.object_name
  from src_t1, src_t2  where src_t1.object_id = src_t2.object_id

Plan hash value: 1844907984

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |           |       |       |    18 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                    |           |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10000  | 24146 |   707K|    18   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TRG       |       |       |            |          |  Q1,00 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |           | 24146 |   707K|    18   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   5 |      NESTED LOOPS                  |           | 24146 |   707K|    18   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   6 |       PX BLOCK ITERATOR            |           |       |       |            |          |  Q1,00 | PCWC |            |
|*  7 |        TABLE ACCESS FULL           | SRC_T1    | 24146 |   589K|    17   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  8 |       INDEX UNIQUE SCAN            | SRC_T2_PK |     1 |     5 |     0   (0)|          |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access(:Z>=:Z AND :Z<=:Z)
   8 - access("SRC_T1"."OBJECT_ID"="SRC_T2"."OBJECT_ID")

Note
-----
   - Degree of Parallelism is 2 because of table property

 

A parallel DML statement consists of two parts - the part selecting the data (plan lines 5 to 8), and the DML part performing the INSERT, which is in plan line 3. The TQ and IN-OUT columns of the execution plan indicate that both operations were performed by the same PX server set (operations 2 through 8 have the same value in the TQ column). 

The important section here is that the part of the plan where the actual insert takes place, plan line 3 (LOAD AS SELECT), was performed by the parallel processes. This can also be confirmed by checking the position of the "PX SEND QC" (line 2) and "PX COORDINATOR" (line 1) operators relative to the "LOAD AS SELECT" plan line. The “LOAD AS SELECT” action is indented and below the “PX” actions, meaning that the query coordinator wasn't involved with inserting the data. In this instance, the actual load of the data into the table was handled by the parallel processes themselves and so the query coordinator just distributed the work to the parallel processes and waited for their completion before allowing the INSERT statement to complete.

Change of execution plan - the DML part is executed serially by the query coordinator

“A change" was made (we’ll see later on what it was), causing the same INSERT statement to produce a different execution plan:

SQL> insert /*+APPEND*/ into trg
select src_t1.object_id, src_t1.object_name
  from src_t1, src_t2
 where src_t1.object_id = src_t2.object_id;

24146 rows created.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  cf5guzbqwfv41, child number 0
-------------------------------------
insert /*+APPEND*/ into trg select src_t1.object_id, src_t1.object_name
  from src_t1, src_t2  where src_t1.object_id = src_t2.object_id

Plan hash value: 956496987

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |           |       |       |    18 (100)|          |        |      |            |
|   1 |  LOAD AS SELECT                    | TRG       |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR                   |           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)             | :TQ10000  | 24146 |   707K|    18   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     OPTIMIZER STATISTICS GATHERING |           | 24146 |   707K|    18   (6)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      NESTED LOOPS                  |           | 24146 |   707K|    18   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   6 |       PX BLOCK ITERATOR            |           |       |       |            |          |  Q1,00 | PCWC |            |
|*  7 |        TABLE ACCESS FULL           | SRC_T1    | 24146 |   589K|    17   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  8 |       INDEX UNIQUE SCAN            | SRC_T2_PK |     1 |     5 |     0   (0)|          |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access(:Z>=:Z AND :Z<=:Z)
   8 - access("SRC_T1"."OBJECT_ID"="SRC_T2"."OBJECT_ID")

Note
-----
   - Degree of Parallelism is 2 because of table property
   - PDML disabled because function is not pure and not declared parallel enabled


32 rows selected.

Let’s also validate the distribution of work across parallel execution processes by querying the v$pq_tqstat view (a script similar to this one can be used):

SQL> @pq_tqstat.sql

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P000                 12453     544336     51.57      51.26         20          0           0
                                             1 P001                 11693     517519     48.43      48.74         19          0           0
                      Consumer               1 QC                   24146    1061855    100.00     100.00         50         18           0



SQL>

The "PDML disabled because function is not pure and not declared parallel enabled" message appears in the "Note" section, and the INSERT part of the query (plan line 1 - “LOAD AS SELECT”) is performed serially using the concatenated data produced by the query coordinator in plan line 2. This can be confirmed by the "PX SEND QC" operation in plan line 3, which sends the retrieved data to the query coordinator as shown in the "IN-OUT" column showing "P->S" (Parallel to serial). The output from v$pq_tqstat, where the QC (query coordinator) is shown consuming the 24146 rows sent by the two parallel processes (12453 + 11693).

Getting to the root cause

In such "unusual" cases, tracing the SQL's parallel execution might provide additional insights. A good starting point is to follow the steps in the "PX Trace files" section of this My Oracle Support document: "SRDC - Data Collection for Query Executing in Parallel Performance Issue (Doc ID 1684432.1)" My Oracle Support document. In this specific case, enabling the optimizer trace is enough. If the SQL is already loaded in the library cache, a hard parse is also needed:

ALTER SESSION SET nls_date_format='DD-MM-YYYY HH24:MI:SS';
ALTER SESSION SET tracefile_identifier='PX_Perf';
ALTER SESSION SET max_dump_file_size = unlimited;
ALTER SESSION SET timed_statistics = true;
ALTER SESSION SET statistics_level=all;

ALTER SESSION SET events 'trace[rdbms.SQL_Optimizer.*] [sql:cf5guzbqwfv41] disk high';

<execute the INSERT>

 

The produced optimizer trace does not contain any apparent clues as to what could be the cause of such behaviour. Checking the most relevant sections of the trace, the following line seemed like an anomaly in the "Join Elimination" query transformation section:

JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
*************************
Join Elimination [JE:R 12.2]    
*************************
.
.
PL/SQL function (TEST_FUNC) is not secure.

At this point we’re not sure why “TEST_FUNC” is reported in the trace, what it is, and where it is used or referenced, but the "PL/SQL function (TEST_FUNC) is not secure" message gives us a clue for further investigation. As it turns out, the “TEST_FUNC” PL/SQL function is referenced by a function based index created on the src_t2 table:

create or replace package  test_utils
as
    function test_func(p_1 in number,
                       p_2 in number)
        return number deterministic;
end;
/

create or replace package body test_utils
is
    function test_func(p_1 in number,
                       p_2 in number)
        return number deterministic is
      begin
        return coalesce(p_1, p_2);
    end;
end;    
/

create index src_fbi on src_t2(test_utils.test_func(col1, col2));

 

The problem is that the “TEST_FUNC” doesn't have the PARALLEL_ENABLE property set, and is thus not enabled for parallel execution. The keyword “PARALLEL_ENABLE” indicates that the function is safe to be executed by multiple individual PX servers in parallel.

Note that the SRC_FBI index is on the SOURCE table in the SELECT clause and not on the TARGET table which is being inserted.  Even though the SRC_FBI index is not being referenced in the final execution plan, its definition is still being checked during the optimization phase of the query, and that results in "pdml_reason : function is not pure and not declared parallel enabled", making the DML part run serially. Since the SRC_FBI index is not mentioned anywhere in the execution plan, this makes troubleshooting this issue a non-trivial task.

How can the issue be fixed?

If dropping the index is not an option, the next "obvious" solution would be to declare the custom PL/SQL function with the PARALLEL_ENABLE attribute:

    function test_func(p_1 in number,
                       p_2 in number)
        return number deterministic PARALLEL_ENABLE;

 

and then rebuild the SRC_FBI index.

However, before doing that we have to be sure that the logic contained in the "TEST_FUNC" is safe to be executed by the parallel processes. One of the checks is to verify the parallel safety of the functions being referenced within the parent function.  We can check the COALESCE function that we call in the TEST_FUNC function using the following SQL :

SQL> select parallel from dba_procedures where procedure_name = 'COALESCE';

PARALLEL
------------
NO

 

There are also other considerations - this is a very good blog post illustrating how a wrongfully defined PARALLEL_ENABLE PL/SQL function can produce wrong results - Parallel PL/SQL Functions and Global Temporary Tables... and Wrong Results.

If an Oracle pre-defined function is not safe to be executed in parallel, a rewrite may be considered. For example, in our specific case, the COALESCE could be rewritten as a CASE or an IF statement, and then TEST_FUNC redefined with PARALLEL_ENABLE:

create or replace package  test_utils
as
    function test_func(p_1 in number,
                       p_2 in number)
        return number deterministic PARALLEL_ENABLE;
end;
/



create or replace package body test_utils
is
    function test_func(p_1 in number,
                       p_2 in number)
        return number deterministic PARALLEL_ENABLE is
      begin
        if p_1 is not null then 
          return p_1;
        else 
          return p_2;
        end if;
    end;
end;    
/


alter index src_fbi rebuild online;

After this modification, the DML part of the INSERT is being executed in parallel again using the original execution plan.

Conclusion

The goal of this post was to illustrate the effects of how a seemingly harmless function-based index creation impacted a parallel INSERT statement. The troubleshooting of the issue was not trivial, since the problematic index did not show up in the execution plan.

 

No Comments Yet

Let us know what you think

Subscribe by email