Oracle Parallel Query Sorting and Index Creation Performance Problems

Nov 16, 2009 / By Christo Kutrovsky

Tags: ,

Ever wondered why recreating certain indexes takes forever, even when you do so in parallel? Ever wondered why certain PQ queries just don’t run that fast?

Here’s a serious performance bug that’s been in Oracle for a while, and finally there are hints of it been fixed, but only partially.

The bug happens when performing sorting operations in parallel, and the source data is already well sorted. The “ranger” doesn’t do a good job of properly assigning row ranges to sorter processes, and ~90% of the rows end up being sent to the same parallel process, regardless of the level of parallelism. So even if you have 256 CPUs, only about 10% performance improvement is achieve, instead of a factor of your parallelism when running the query in parallel.

For example, if the non parallel sort/index creation took 45 minutes, running with parallel 32 will take 41 minutes, instead of the possible 1.4 minutes (assuming you have sufficient horsepower).

When running a “Sort” operation in parallel, there are 2 sets of parallel processes. Producers and consumers. The number of producer/consumer pairs depends on your parallelism settings. So in the case of parallel 32 there are 32 producers and 32 consumers. This is well illustrated in the Oracle documentation HERE (Oracle web based documentation down, will update later). As each ‘producer’ is reading data it is sending it to the appropriate consumer for that “range”. For example, consumer 1 takes A-B, consumer 2 takes C-F, consumer 3 G-L and etc. The exact split is dynamically calculated by the “Ranger” process. Unfortunately it doesn’t work so good with sorted data.

The same applies to index creation. Index creation is basically a big sort, followed by writing out the result set into a B-Tree structure. Index creation suffers from the exact same ranging issues, at least until 11.2.

Here’s an example:

-- Create mini-sample table
create table mytest_s as select rownum r from dual connect by level <=400000;

-- Fetch only 1 row, no need to fetch all
begin
  for c in ( select /*+PARALLEL(t,4)*/ * from mytest_s t order by 1) loop
    exit;
  end loop;
end;
/
select dfo_number "d", tq_id as "t", server_type, num_rows,rpad('x',round(num_rows*10/nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type),0)),'x') as "pr", round(bytes/1024/1024) mb,  process, instance i,round(ratio_to_report (num_rows) over (partition by dfo_number, tq_id, server_type)*100) as "%", open_time, avg_latency, waits, timeouts,round(bytes/nullif(num_rows,0)) as "b/r"
from v$pq_tqstat order by dfo_number, tq_id, server_type desc, process;

d t SERVER_TYPE   NUM_ROWS pr           MB PROCESS I    %  OPEN_TIME AVG_LATENCY   WAITS   TIMEOUTS  b/r
- - ----------- ---------- ----------- --- ------- - ---- ---------- ----------- ------- ---------- ----
1 0 Ranger             372 xxxxxxxxxx    0 QC      1  100          0           0       0          0   11

1 0 Producer        126144 xxxxxxxxxx    1 P012    1   32          0           0      19          2    6
1 0 Producer         47304 xxxx          0 P013    1   12          0           0       8          1    6
1 0 Producer        110376 xxxxxxxxx     1 P014    1   28          0           0      17          2    6
1 0 Producer        116176 xxxxxxxxx     1 P015    1   29          0           0      16          0    6

1 0 Consumer          7885               0 P008    1    2          0           0       6          1    5
1 0 Consumer          7884               0 P009    1    2          0           0       6          1    6
1 0 Consumer          7884               0 P010    1    2          0           0       7          2    6
1 0 Consumer        376347 xxxxxxxxxx    2 P011    1   94          0           0      16          4    6

1 1 Producer          5536 xxxxxxxxxx    0 P008    1   55          0           0     577        568    3
1 1 Producer          4508 xxxxxxxx      0 P011    1   45          0           0     587        571    4

1 1 Consumer           100 xxxxxxxxxx    0 QC      1  100          0           0       1          0  161

As you can see from this test case, the sorter processes (4 consumers) had a very uneven split, with 94% of the rows been sent to only one consumer. I tested this case with parallel 64, and in that case 90% gets sent to 1 consumer, with the other 10% evenly distributed on the remaining ones.

This essentially reduces your execution time by at most 10%.

A very similar thing happens if you create an index. I did, however, catch an anomaly in the test case. With parallel 4, the distribution is 66/34/0/0, while with parallel 8, it’s 100/0/0/0/0/0/0/0. I.e., terrible as all the work will be performed by 1 process, absolutely the same as Serial, only a little worse as there will be inter-process communication.

create index on mytest_s (r) parallel 4;
select dfo_number "d", tq_id as "t", server_type, num_rows,rpad('x',round(num_rows*10/nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type),0)),'x') as "pr", round(bytes/1024/1024) mb,  process, instance i,round(ratio_to_report (num_rows) over (partition by dfo_number, tq_id, server_type)*100) as "%", open_time, avg_latency, waits, timeouts,round(bytes/nullif(num_rows,0)) as "b/r"
from v$pq_tqstat order by dfo_number, tq_id, server_type desc, process;

SERVER_TYPE   NUM_ROWS pr          MB PROCESS  I   %  OPEN_TIME AVG_LATENCY      WAITS   TIMEOUTS        b/r
----------- ---------- ----------- -- -------- - --- ---------- ----------- ---------- ---------- ----------
Ranger              12 xxxxxxxxxx   0 QC       1 100          0           0          1          0       3974
Producer        132601 xxxxxxxxxx   2 P004     1  33          0           0         19          1         18
Producer         95265 xxxxxxx      2 P005     1  24          0           0         14          2         18
Producer         95265 xxxxxxx      2 P006     1  24          0           0         14          1         18
Producer         79497 xxxxxx       1 P007     1  20          0           0         11          0         18

Consumer        262308 xxxxxxxxxx   4 P000     1  66          0           0         76         73         18
Consumer           164              0 P001     1   0          0           0         77         74         19
Consumer           164              0 P002     1   0          0           0         76         73         19
Consumer        137364 xxxxx        2 P003     1  34          0           0         76         73         18

Producer             1 xxxxxxxxxx   0 P000     1  25          0           0          0          0        322
Producer             1 xxxxxxxxxx   0 P001     1  25          0           0          0          0        322
Producer             1 xxxxxxxxxx   0 P002     1  25          0           0          0          0        322
Producer             1 xxxxxxxxxx   0 P003     1  25          0           0          0          0        322
Consumer             4 xxxxxxxxxx   0 QC       1 100          0           0          1          0        322

create index on mytest_s (r) parallel 8;

SERVER_TYPE   NUM_ROWS pr            MB PROCESS  I   %  OPEN_TIME AVG_LATENCY      WAITS   TIMEOUTS        b/r
----------- ---------- ------------- -- -------- - --- ---------- ----------- ---------- ---------- ----------
Ranger               0                0 QC       1              0           0          6          2

Producer         47304 xxxxxxxx       1 P008     1  12          0           0         14          2         18
Producer         51246 xxxxxxxxx      1 P009     1  13          0           0         15          1         18
Producer         47304 xxxxxxxx       1 P010     1  12          0           0         15          2         18
Producer         45220 xxxxxxxx       1 P011     1  11          0           0         15          2         18
Producer         59130 xxxxxxxxxx     1 P012     1  15          0           0         17          3         18
Producer         35478 xxxxxx         1 P013     1   9          0           0         10          3         18
Producer         55188 xxxxxxxxx      1 P014     1  14          0           0         16          1         18
Producer         59130 xxxxxxxxxx     1 P015     1  15          0           0         17          1         18

Consumer        400000 xxxxxxxxxx     7 P000     1 100          0           0         52         49         18
Consumer             0                0 P001     1   0          0           0         52         49
Consumer             0                0 P002     1   0          0           0         52         49
Consumer             0                0 P003     1   0          0           0         52         49
Consumer             0                0 P004     1   0          0           0         52         49
Consumer             0                0 P005     1   0          0           0         52         49
Consumer             0                0 P006     1   0          0           0         52         49
Consumer             0                0 P007     1   0          0           0         52         49

Producer             1 xxxxxxxxxx     0 P000     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P001     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P002     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P003     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P004     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P005     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P006     1  13          0           0          0          0        322
Producer             1 xxxxxxxxxx     0 P007     1  13          0           0          0          0        322

Consumer             8 xxxxxxxxxx     0 QC       1 100          0           0          2          1        322

To further explore the implications of this bug, I created a more elaborate test case. I created several types of data, and tested ordering against each “class.”

create table mytest as
select rownum pk, trunc(dbms_random.value(0,400000)) rnd,
floor(rownum/1000) type_1000, floor(rownum/5) type_5,
mod(rownum,1000) type_1000mod,mod(rownum,5) type_5mod,mod(rownum,10009) type_10000mod,
sysdate-rownum/100 as dt,
rpad('x',10,'x') pad from dual connect by level <=400000;

To spare you some of the testing, here are my results. Query ran:
select /*+PARALLEL(t,[DEGREE])*/* from mytest order by [ORDER BY];

Degree Order by Distribution(%)
4 pk 94/2/2/2
64 pk 93/0…
4 rnd 28/21/24/27
64 rnd 2/1/2/1…
4 type_1000 94/2/2/2
4 type_5 94/2/2/2
64 type_5 93/0/0….
4 type_5mod 40/20/20/20
8 type_5mod 20/20/0/20/20/0/20/0
8 type_1000mod 17/13/8/6/4/4/4/43
64 type_1000mod 4/3/1/4/2/1….
4 dt 96/2/2/0
64 dt 93/0/0/0…
4 type_5,pk 2/2/2/94
4 type_5,rnd 2/2/2/94
4 type_5_mod,pk 20/20/20/39
64 type_5_mod,pk 19/19/19/19/19/0/0…
4 type_5_mod,rnd 26/24/26/24

A few quick conclusions:

  • The first column of order matters for distribution.
  • If the Order by column has repeated values, PQ sort will be limited by number of distinct values, but only if they are not grouped together.
  • Index creation on time series (log table, stock table) is slow, “type” indexes are slow.
  • If you reorder the keys of an index, you may affect the time it takes to create it.
  • Follow up on the preceding, especially true if you try to put low cardinality columns first to improve compression.

One could dispute in which cases data in the table is ordered, and it’s amazing how many cases this is:

  • Time series data — ever-growing data. The PK is ordered; the “insert date” is somewhat ordered.
  • Data warehouses — bulk load files are often ordered via some conditions.
  • Sometimes it is good to reorder a table, to improve data locality and compression in data warehouses. This can, however, have negative effects on index build time.
  • Sometimes one of the intermediate steps will return an ordered set for the final processing.

One example of the last type, is analytics. But that’s for a separate blog post.

And finally, to end on an optimistic note, it appears that 11.2 has the index creation issue resolved, but the order by in queries is still bad.

12 Responses to “Oracle Parallel Query Sorting and Index Creation Performance Problems”

  • Dion Cho says:

    Thanks for the sharing.

    Do you(OP) have a metalink note to meet this phenomenon?

  • Christo Kutrovsky says:

    Dion, I am not sure what exactly you mean by “meet”, could you elaborate?

  • Dion Cho says:

    Christo.

    Sorry for wrong word. :(

    To rephrase my question -
    “Is there any metalink document that is registered as a bug or something on this phenomenon?”

  • Christo Kutrovsky says:

    I am not aware of any such document, but I haven’t been very thorough in my search.

    As I said, this is semi-fixed in 11g Release 2 (11.2) so Oracle knows about it.

    I will make separate post with example at some point.

  • Timur Akhmadeev says:

    Hi Christo,

    an interesting observation. Have you experimented with a bigger table – for ex., 20-40M rows? Since PX is made for that type of volumes, not 400K, it is reasonable to test with appropriate data volumes, especially with such high DOPs.
    I’ve tried to repeat your case – but unfortunately for some reason v$pq_tqstat didn’t show statistics for PX slaves as a Consumer (only Producers). That was 10.2.0.4 & 10.2.0.4.2.
    What Oracle version have you used?

  • Christo Kutrovsky says:

    Actually that’s how I discovered this, by having big table’s parallel operations take forever.

    I’ve tested this in 10.2.0.4, 11.1.0.7 and 11.2 base.

    I will try my example with larger tables to confirm it scales the same as when I discovered this.

  • [...] 5-Bug on Running Parallel Query with Order by on already sorted data causes problem? Christo Kutrovsky-Oracle Parallel Query Sorting and Index Creation Performance Problems [...]

  • Zohar says:

    Did you find any metalink documentation since this post was made? we encounter the same problem but can’t find any official documentation of the problem.

  • Hi Zohar,

    I don’t have any metalink documents to give you. I have been able to workaround the issue in some cases, but it’s mostly not going anywhere.

    I do have a blog almost ready with a larger dataset, but it only confirms the issue further. i.e. not much change.

  • Amir Riaz says:

    The reason you are getting uneven distribution of rows across producer processes is, that part is running in serial.

    you did not specify parallel clause for table, while you specify parallel clause for index.

    try

    create table big_emp parallel (degree 4)
    as select * from emp;

    CREATE INDEX emp_ix ON emp (emp_id)
    PARALLEL (DEGREE 4);

    It would be interesting to see the results.

    • On queries like the one in my example it is sufficient to have a parallel hint to have the query run fully in parallel.

      My entire test case is documented, I invite you to study the explain plans and re-run my test case.

  • Amir Riaz says:

    Yes, confirmed. it was a bug. I check our all databases 10g and 11.1.6. the problem is, when query go parallel but the data is unevenly distributed across the slave processes which increases the query execution time. in case if all the 100% of data is being assigned to a slave process. the time of serial processing is nearly equal to time of parallel processing. in fact then parallel processing take more time.

    cannot found this problem with 11.2.

    I am expecting that you write something about order by clause in 11.2 and lets open it for discussion

    Thanks.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>