# Issues with Plan Cache Reuse & Row Goal Optimization

Oct 8, 2015

I am presenting here on behalf of my colleague Fabiano Amorim (he is busy resolving other exciting performance issues... :-D ) . Fabiano had an interesting case with one of our customers that is very common in SQL Server. The case is about a performance issue caused by two optimizer decisions not working well together:

### Problem Description

Let's review the following query:

select top 1 col_date from tab1 where col1 = 10 and col2 = 1 and col3 = 1 order by col_date asc

Table tab1 have two indexes:
1. ix1 (col1, col_date, col2) include(col3)
2. ix2 (col1, col2, col3) include(col_date)
The Query optimizer (QO) has two query plan options:
1. select -> top -> filter -> index seek (ix1) Read the ordered index ix1 by b-tree seeking by “col1 = 10”, apply the residual predicates (filter) “col2 = 1 and col3=1”, after reading just 1 row (TOP 1) the execution is finished since the index is ordered by col1, col_date, the first col_date returned is already the TOP1 ASC according to the index order.
2. select -> top N sort -> index seek (ix2) Read the covered index ix2 b-tree (notice it has all needed columns), seeking by “col1 = 10 and col2 = 1 and col3=1”, get the col_date in the index leaf level (included column), use “top N sort” algorithm to sort and keep only TOP 1 row, finish execution.
The problem, is that, if the QO chooses the first option, this will be good for high selectivity predicates. For instance, let’s suppose that “col1 = 10” returns 5 rows; remember that index ix1 is ordered by col1, col_date, col2:   col1 | col2| col3 | col_date 10 | 4 | 4 | 2015-12-01 10 | 3 | 3 | 2015-12-02 10 | 1 | 1 | 2015-12-03 10 | 5 | 5 | 2015-12-04 10 | 2 | 2 | 2015-12-05   After seeking the index, SQL will need to apply the residual predicate (“col2 = 1 and col3=1”) until it finds the "row goal": TOP iterator is asking for just one row, in this case the third row will match the predicate and SQL Server will return the first row that matches the residual predicate. So, in this case it has to read only 3 rows. So far so good... Now, let’s supposed SQL created that plan, and now it's going to reuse it for a new value on col1 filter:

select top 1 col_date from tab1 where col1 = 99 and col2 = 1 and col3 = 1 order by col_date asc

What if after the seek (“col1 = 99”) 2 million of rows are returned? Now this plan is not so good, since it will need to apply the predicate on many rows before it finds a match:   col1 | col2| col3 | col_date 99 | 2 | 2 | 2015-12-01 99 | 2 | 2 | 2015-12-02 ...after a couple of million rows… 99 | 1 | 1 | 2015-12-03 99 | 2 | 2 | 2015-12-04 99 | 2 | 2 | 2015-12-05   In this case, using the second option is better. Just go and seek the b-tree for all values (col1 = 99 and col2 = 1 and col3 = 1), this will return 1 row… TOP n SORT will do almost nothing and execution will finish quickly. Here is the problem: most of the times, SQL knows whether to use option 1 or option 2 based on the parameters values. But if it is reusing the plan from cache, the optimization path may already be set improperly resulting in the known issue called "parameter sniffing" (plan reuse that is wrong for the specific set of rows)… That means that the row goal optimization should not be used if there is a covered indx. Unfortunately by default, QO "thinks" this is cheaper than “seek+top n sort”… Of course it all depends on the distribution of data…So in a nutshell, QO chooses rowgoal optimization where this should not be used therefore we should pay extra attention to those kind of plans…

### Possible Solutions

There are many alternatives to fix it. Some examples:
1. Force the index (index=ix2)
2. Option(recompile)
3. drop the index ix1, define ix2 as a unique (tells QO that only 1 row will be returned)
Each one of the above has advantages and disadvantages. We also need to ensure that statistics are up to date!