MySQL – No Index Used With ORDER BY + LIMIT and DISTINCT

Jan 14, 2007 / By Alex Gorbachev

Tags:

Continuing MySQL saga…
This is actually a follow up on my previous post. Developers tried to rewrite all statements and even overdid it. As we say in Russia – “teach fool how to pray and he will break his forehead”. Note, I had to rename columns/tables as I write so sorry for possible typos

Query converted by developers from IN subqueries to joins:

SELECT distinct t.col_1, t.col_2, t.col_3
   FROM t LEFT JOIN t1 ON t.col_3 = t1.col_3
          LEFT JOIN t2 ON t.col_3 = t2.col_3
   WHERE TIMEDIFF(now(),t.col_date) >= '00:05:00'
     AND t1.col_3 is not null
     AND t2.col_3 is not null
     AND t.col_4 = 'string'
   ORDER BY t.col_5 desc
   LIMIT 0,6;

My first take was “smart” use of TimeDiff function in the WHERE clause. I was slightly surprised by the way it was compared – “>= '00:05:00'” but more about this later. I checked selectivity of that condition and found that it was very poor – returning large percent of total rows in the table. Column col_4 contained only 2 values either.

Next, I looked at ORDER BY – created index on col_5 but MySQL didn’t pick it up. Luckily, couple days ago Paul Moen posted about ORDER BY and LIMIT optimization so I knew it should work – MySQL is not that bad in the end. ;-) One more careful look and you notice DISTINCT keyword. Bingo! Obviously, MySQL couldn’t and shouldn’t use index in that case. Interesting that t.col_3 is the primary key so distinct was introduced during conversion from IN subqueries to joins. Well, there is no silver bullet so join is no good here.

In the end query was rewritten as

SELECT 1.col_1, t.col_2, t.col_3
    FROM t
    WHERE TIMEDIFF(now(),t.col_date) >= '00:05:00'
      AND t.col_4 = 'string'
      AND EXISTS (select 1 from t1 where t.col_3=t1.col_3)
      AND EXISTS (select 1 from t2 where t.col_3=t2.col_3)
   ORDER BY t.col_5 DESC
   LIMIT 6;

Note that DISTINCT disappeared and joins were converted to EXISTS conditions. Very handy! Index created on two columns to cover also t.col_4 = 'string':

create index col_4_col_5 on t (col_4,col5);

Result was very positive and response was down to less than 10 ms.

I noticed later that condition with TimeDiff is very tricky and string comparison in the end is not always appropriate. I would rather use AddTime and date comparison.

Just now I figured what kept me uneasy about this statement – DISTINCT causes grouping and ORDER BY column is not part of the select list! In this particular case there is no problem as primary key is in the select list. However, I can’t figure out how MySQL would group and sort if there are several col_5 values per distinct row. I can’t see the deterministic result. In another example today, I witnessed that I can use a column in ORDER BY that is not part of GROUP BY list. Seems like the same issue. If I lost you with this – stay tuned. I must be blogging about it soon.

4 Responses to “MySQL – No Index Used With ORDER BY + LIMIT and DISTINCT”

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>