Oracle parallel query hints reference - part 5: PQ_DISTRIBUTE_WINDOW

Written by Christo Kutrovsky | Aug 8, 2018 4:00:00 AM

Welcome to part 5 of the series.

Parallel Analytics: A 17-Year Evolution

Background: From Oracle 9i to 12.1

First, some background: If you want to skip to the reference part, scroll down. This hint was introduced relatively recently - in Oracle 12.1 - where for the first time, SQL Windowing functions are getting some proper parallel features. Analytics were introduced in Oracle 9i, which was released in 2001 and already supported parallelism.

This is the first time in 17 years that Oracle improved analytical functions to better work with parallelism. A pretty poor product management, considering Exadata has been out for 10 years, but what do I know about priorities.

The Skew Problem in Parallel Execution

In Oracle versions prior to that, WINDOWING supported parallelism, but when it came to distributing the work across parallel workers, it only supported the SEND RANGE method - i.e. a distributed sort. This often resulted in a skew in data sent to parallel workers, affecting performance significantly.

This was especially the case if the input was already somewhat ordered, which is often the case due to the natural order of data loads, or engineered for improved data locality. When that happened, you would have 80% of the parallel workers completing the work and processing 10% of the data, and this one worker would be running forever.

Performance Comparison: Serial vs. Parallel Plans

The SQL MODEL Clause and Partitioning

Oracle 10g introduced the SQL MODEL clause, which allowed some pretty advanced manipulation of the result set - similar to Microsoft Excel formulas. The MODEL language allowed analytical style functions to be used, and a very powerful feature was the introduction of PARTITION BY in the definition of the MODEL.

This split the data BY HASH amongst different parallel workers, allowing efficient usage of analytical functions, but was rather inconvenient as it has to be within the MODEL clause. Even in 12.2, analytical functions are still a bit handicapped, as they are still not PARTITION aware. Here's an example using the tlarge_p table (which is hash partitioned on ID) from my previous blogs in the series:

Serial plan: ```sql select count() over (partition by id ) as rnk,t. from tlarge_p t;

| 0 | SELECT STATEMENT | | 88447 | 28M| | 8042 (1)| 00:00:01 | | | | 1 | PARTITION HASH ALL | | 88447 | 28M| | 8042 (1)| 00:00:01 | 1 | 16 | | 2 | WINDOW SORT | | 88447 | 28M| 31M| 8042 (1)| 00:00:01 | | | | 3 | TABLE ACCESS STORAGE FULL| TLARGE_P | 88447 | 28M| | 2341 (1)| 00:00:01 | 1 | 16 |

 **Parallel plan:** ```sql select /*+ PARALLEL(4)*/count(*) over (partition by id ) as rnk,t.* from tlarge_p t;  --------------------------------------------------------------------------------------------------------------------------------------------------  | Id | Operation | Name | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |  --------------------------------------------------------------------------------------------------------------------------------------------------  | 0 | SELECT STATEMENT | | 88447 | 28M| | 651 (1)| 00:00:01 | | | | | |  | 1 | PX COORDINATOR | | | | | | | | | | | |  | 2 | PX SEND QC (RANDOM) | :TQ10001 | 88447 | 28M| | 651 (1)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) |  | 3 | WINDOW SORT | | 88447 | 28M| 31M| 651 (1)| 00:00:01 | | | Q1,01 | PCWP | |  | 4 | PX RECEIVE | | 88447 | 28M| | 650 (0)| 00:00:01 | | | Q1,01 | PCWP | |  | 5 | PX SEND HASH | :TQ10000 | 88447 | 28M| | 650 (0)| 00:00:01 | | | Q1,00 | P->P | HASH |  | 6 | PX BLOCK ITERATOR | | 88447 | 28M| | 650 (0)| 00:00:01 | 1 | 16 | Q1,00 | PCWC | |  | 7 | TABLE ACCESS STORAGE FULL| TLARGE_P | 88447 | 28M| | 650 (0)| 00:00:01 | 1 | 16 | Q1,00 | PCWP |  --------------------------------------------------------------------------------------------------------------------------------------------------

Analyzing Plan Execution in Oracle 12.2

The difference between the two plans is that the serial plan performs a WINDOW SORT for each partition. The WINDOW SORT is under (before) the PARTITION HASH ALL operator, sorting much less data and needing much less memory for the sort. The parallel plan sorts the ENTIRE result set. Basically, it doesn't have the equivalent of Partition-Wise Join in the PARALLEL plan. Partition-Wise Joins work with both serial and parallel plans.

NOTE: Oracle 18c supports PARTITION aware parallel analytics. A new hint is added: USE_PARTITION_WISE_WIF.

Reference Guide: The PQ_DISTRIBUTE_WINDOW Hint

But enough background. Here's what I've discovered about the PQ_DISTRIBUTE_WINDOW functions. Note that as of the writing of this blog, the PQ_DISTRIBUTE_WINDOW hint is still undocumented and there are no references in MOS (My Oracle Support).

Usage:

PQ_DISTRIBUTE_WINDOW (@query_block data_distribution_method)

So far I've discovered three distribution methods:

Method 1: HASH Distribution Then SORT

The input data set is hashed based on the partition key, and each worker performs a WINDOW SORT for its own data set. This is new in 12.1 and operates in the same manner as the SQL MODEL with PARTITION BY.

Example: ```sql select /+ PARALLEL(4) PQ_DISTRIBUTE_WINDOW (@SEL$1 1) /count() over (partition by id ) as rnk,t. from tlarge t;

| 0 | SELECT STATEMENT | | 100K| 52M| | 640 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 100K| 52M| | 640 (1)| 00:00:01 | Q1,01 | P->S | QC (RAND) | | 3 | WINDOW SORT | | 100K| 52M| 55M| 640 (1)| 00:00:01 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 100K| 52M| | 638 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 100K| 52M| | 638 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 100K| 52M| | 638 (0)| 00:00:01 | Q1,00 | PCWC | | | 7 | TABLE ACCESS STORAGE FULL| TLARGE | 100K| 52M| | 638 (0)| 00:00:01 | Q1,00 | PCWP | |

This approach is suitable when the "keys" (PARTITION BY CLAUSE) are relatively equally sized, and won't result in any one worker receiving a much larger data set to operate on.   ### Method 2: SORT Then HASH Distribution (Consolidation) The input data set is sorted by each worker that reads the data, then it is sent via **HASH distribution** for consolidation. This uses the new **WINDOW CONSOLIDATOR BUFFER** step. This operates similarly to the GBY_PUSHDOWN - where data is aggregated twice. Once in each worker's private result set and once in the "final" result set.     Often such pre-aggregation reduces the amount of data exchanged by workers dramatically. In the case of WINDOW SORT, sorting performance decreases exponentially with the size of the data, so sorting smaller result sets is generally more efficient.   **Example:** ```sql select /*+ PARALLEL(4) PQ_DISTRIBUTE_WINDOW (@SEL$1 2) */count(*) over ( partition by mod5_id ) as rnk,t.* from tlarge t;  ---------------------------------------------------------------------------------------------------------------------------  | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib |  ---------------------------------------------------------------------------------------------------------------------------  | 0 | SELECT STATEMENT | | 100K| 52M| 640 (1)| 00:00:01 | | | |  | 1 | PX COORDINATOR | | | | | | | | |  | 2 | PX SEND QC (RANDOM) | :TQ10001 | 100K| 52M| 640 (1)| 00:00:01 | Q1,01 | P->S | QC (RAND) |  | 3 | WINDOW CONSOLIDATOR BUFFER | | 100K| 52M| 640 (1)| 00:00:01 | Q1,01 | PCWP | |  | 4 | PX RECEIVE | | 100K| 52M| 640 (1)| 00:00:01 | Q1,01 | PCWP | |  | 5 | PX SEND HASH | :TQ10000 | 100K| 52M| 640 (1)| 00:00:01 | Q1,00 | P->P | HASH |  | 6 | WINDOW SORT | | 100K| 52M| 640 (1)| 00:00:01 | Q1,00 | PCWP | |  | 7 | PX BLOCK ITERATOR | | 100K| 52M| 638 (0)| 00:00:01 | Q1,00 | PCWC | |  | 8 | TABLE ACCESS STORAGE FULL| TLARGE | 100K| 52M| 638 (0)| 00:00:01 | Q1,00 | PCWP | |  ---------------------------------------------------------------------------------------------------------------------------

This method is slightly more CPU-intensive, as data is sorted once in the worker that reads the data, then re-shuffled and consolidated in another parallel worker. This could require twice the amount of memory for sorting. This method is suitable when the "keys" (PARTITION BY clause) could be skewed, and it's best to "distribute" the skew in the reader side as opposed to in the final receiving side.

Method 3: The Classical SEND RANGE Then SORT

This is the "classical" operator - the same functionality since Oracle 9i. In my testing, it's a "fallback" option when the partition key and order keys are not the same. Data is sent from PQ readers to PQ receivers based on set "split" points, and each PQ receiver sorts the data.

Example: ```sql select /+ PARALLEL(4) PQ_DISTRIBUTE_WINDOW (@SEL$1 3) /count() over ( partition by mod5_id order by id) as rnk,t. from tlarge t;

| 0 | SELECT STATEMENT | | 100K| 52M| | 640 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 100K| 52M| | 640 (1)| 00:00:01 | Q1,01 | P->S | QC (RAND) | | 3 | WINDOW SORT | | 100K| 52M| 55M| 640 (1)| 00:00:01 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 100K| 52M| | 638 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX SEND RANGE | :TQ10000 | 100K| 52M| | 638 (0)| 00:00:01 | Q1,00 | P->P | RANGE | | 6 | PX BLOCK ITERATOR | | 100K| 52M| | 638 (0)| 00:00:01 | Q1,00 | PCWC | | | 7 | TABLE ACCESS STORAGE FULL| TLARGE | 100K| 52M| | 638 (0)| 00:00:01 | Q1,00 | PCWP | |

In the above query, you cannot HINT a switch to Method 2 or Method 1. It has an "order by" clause which disables the pure multi-set aggregation and converts it to running total, which requires the classical **SEND RANGE** distribution execution.   ---  That's it for now. Feel free to comment if you discover more methods and more scenarios for the enhanced analytical functions data distributions.  **Would you like me to create a quick reference table comparing when to use Method 1 versus Method 2 based on your specific data skew patterns?**

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?

View full post