In part one of this series on Oracle Parallel Query Hints, I looked at PQ_DISTRIBUTE, and part two was about PQ_REPLICATE. Now, for the third part in the series, I will take a look at PX_JOIN_FILTER.
The PX_JOIN_FILTER is a very interesting hint. It has been around since Oracle 10.2, yet it remains undocumented even after more than 10 years. While its behavior is generally understood, the introduction of Oracle Exadata has made this hint significantly more relevant due to specific storage efficiencies.
The Join Filter is a feature used during data re-shuffling between parallel query (PX) processes. It allows each parallel query process to "pre-filter" data before sending it over by using a Bloom filter.
Typically, this occurs with HASH distributions. However, in Exadata environments, this functionality also extends to the BROADCAST feature by leveraging storage indexes. This optimization reduces the volume of rows exchanged between processes, leading to better performance across different configurations:
To illustrate the impact, let's look at an example. First, we examine a plan where join filters are explicitly disabled using NO_PX_JOIN_FILTER
select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) */ * from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1;
In the resulting plan, notice that TLARGE_TWO (Id 11) is scanned and all 100K rows are sent via PX SEND HASH (Id 9) to the join operator.
Now, let's enable the filter with the PX_JOIN_FILTER hint
select /*+PARALLEL(8) PX_JOIN_FILTER(T2) */ * from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1;
In this optimized plan, the following steps occur:
TLARGE, applies the mod5_Id=1 predicate, and creates a Bloom filter (:BF0000) at step 4 (JOIN FILTER CREATE).TLARGE_TWO, the processes apply the JOIN FILTER USE at step 11. This checks for "presence" in the Bloom filter before even attempting to send rows to the next set of slaves.It is important not to confuse Join Filters with Bloom Filter Partition Elimination. While both rely on the same mathematical structure, they serve different roles.
In the example below, you can see both JOIN FILTER and PART JOIN FILTER (Id 5) acting simultaneously. The PART JOIN FILTER is used specifically to skip reading certain partitions entirely, which is a massive win for large delta loads.
select /*+PARALLEL(8) PX_JOIN_FILTER(t2) */ * from tlarge t1 join tlarge_p t2 on t2.id=t1.id where t1.mod5_id=1;
PX_JOIN_FILTER is a powerful tool for improving parallel query performance. While the optimizer is supposed to use it automatically based on cost, manually hinting it can be vital when the optimizer doesn't recognize the potential for reduction, especially in complex ETL designs.
Ready to optimize your Oracle Database for the future?