MySQL – No Index Used With ORDER BY + LIMIT and DISTINCT
Jan 14, 2007 / By Alex Gorbachev
Continuing MySQL saga…
This is actually a follow up on my previous post. Developers tried to rewrite all statements and even overdid it. As we say in Russia – “teach fool how to pray and he will break his forehead”. Note, I had to rename columns/tables as I write so sorry for possible typos
Query converted by developers from IN subqueries to joins:
SELECT distinct t.col_1, t.col_2, t.col_3 FROM t LEFT JOIN t1 ON t.col_3 = t1.col_3 LEFT JOIN t2 ON t.col_3 = t2.col_3 WHERE TIMEDIFF(now(),t.col_date) >= '00:05:00' AND t1.col_3 is not null AND t2.col_3 is not null AND t.col_4 = 'string' ORDER BY t.col_5 desc LIMIT 0,6;
My first take was “smart” use of
TimeDiff function in the
WHERE clause. I was slightly surprised by the way it was compared – “
>= '00:05:00'” but more about this later. I checked selectivity of that condition and found that it was very poor – returning large percent of total rows in the table. Column col_4 contained only 2 values either.
Next, I looked at
ORDER BY – created index on col_5 but MySQL didn’t pick it up. Luckily, couple days ago Paul Moen posted about
ORDER BY and
LIMIT optimization so I knew it should work – MySQL is not that bad in the end. ;-) One more careful look and you notice
DISTINCT keyword. Bingo! Obviously, MySQL couldn’t and shouldn’t use index in that case. Interesting that t.col_3 is the primary key so distinct was introduced during conversion from IN subqueries to joins. Well, there is no silver bullet so join is no good here.
In the end query was rewritten as
SELECT 1.col_1, t.col_2, t.col_3 FROM t WHERE TIMEDIFF(now(),t.col_date) >= '00:05:00' AND t.col_4 = 'string' AND EXISTS (select 1 from t1 where t.col_3=t1.col_3) AND EXISTS (select 1 from t2 where t.col_3=t2.col_3) ORDER BY t.col_5 DESC LIMIT 6;
DISTINCT disappeared and joins were converted to
EXISTS conditions. Very handy! Index created on two columns to cover also
t.col_4 = 'string':
create index col_4_col_5 on t (col_4,col5);
Result was very positive and response was down to less than 10 ms.
I noticed later that condition with
TimeDiff is very tricky and string comparison in the end is not always appropriate. I would rather use
AddTime and date comparison.
Just now I figured what kept me uneasy about this statement –
DISTINCT causes grouping and
ORDER BY column is not part of the select list! In this particular case there is no problem as primary key is in the select list. However, I can’t figure out how MySQL would group and sort if there are several col_5 values per distinct row. I can’t see the deterministic result. In another example today, I witnessed that I can use a column in
ORDER BY that is not part of
GROUP BY list. Seems like the same issue. If I lost you with this – stay tuned. I must be blogging about it soon.