MySQL GROUP BY and DISTINCT Oddity

Jan 21, 2007 / By Alex Gorbachev

Tags:

At the end of my last MySQL post I mentioned strange behavior with GROUP BY and DISTINCT. In Oracle such constructions are not valid and produce an error.

Here is the test table and statements:

create table t1 (c1 int, c2 int);

insert into t1 values (1,10);
insert into t1 values (2,20);
insert into t1 values (3,30);
insert into t1 values (4,11);
insert into t1 values (4,40);
insert into t1 values (5,50);
insert into t1 values (5,12);

select c1,count(*) cnt from t1 group by c1 order by c2;

select distinct c1 from t1 order by c2;


The following two statements generate errors in Oracle for an obvious reason. In the first statement, ORDER BY clause can contain only GROUP BY expressions or aggregate functions so that the values are deterministic for each grouped row.

SQL> select c1,count(*) cnt from t1 group by c1 order by c2;
select c1,count(*) cnt from t1 group by c1 order by c2
                                                    *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

DISTINCT can be considered as a special case of GROUP BY (SELECT DISTINCT c1 from t1 is the same as SELECT c1 from t1 GROUP BY c1) and the same restrictions apply so GROUP BY clause of SELECT DISTINCT can only include expressions or columns already selected.

SQL> select distinct c1 from t1 order by c2;
select distinct c1 from t1 order by c2
                                    *
ERROR at line 1:
ORA-01791: not a SELECTed expression

Unlike Oracle, MySQL allows such combinations. Let’s see how our first statement is working in MySQL. Where in the sort output should MySQL put rows with column c1 4 and 5? Possible locations are between 1 and 2 or after 3.

mysql> select c1,count(*) cnt from t1 group by c1 order by c2;
+------+-----+
| c1   | cnt |
+------+-----+
|    1 |   1 |
|    4 |   2 |
|    2 |   1 |
|    3 |   1 |
|    5 |   2 |
+------+-----+
5 rows in set (0.03 sec)

It turned out that MySQL engine takes for с2 column the first value it comes across. It’s easy to see by adding c2 in select list:

mysql> select c1,count(*) cnt, c2 from t1 group by c1 order by c2;
+------+-----+------+
| c1   | cnt | c2   |
+------+-----+------+
|    1 |   1 |   10 |
|    4 |   2 |   11 |
|    2 |   1 |   20 |
|    3 |   1 |   30 |
|    5 |   2 |   50 |
+------+-----+------+
5 rows in set (0.00 sec)

I remember a while ago I used Microsoft Access and there was an aggregate function first() returning the first value it comes across. It looks that MySQL implicitly does the same to the columns not in GROUP BY list. However, there is no such function in the list of MySQL aggregates.

Statement with DISTINCT behaves similarly.

mysql> select distinct c1 from t1 order by c2;
+------+
| c1   |
+------+
|    1 |
|    4 |
|    2 |
|    3 |
|    5 |
+------+
5 rows in set (0.00 sec)

This MySQL “feature” could save some resources on sorts and aggregates but generally I would avoid it as this is not portable solution. Generally speaking, query output is non-deterministic — it depends on the full table scan implementation and on the physical order of rows in a table. This means that it’s actually a bug and , instead, MySQL should produce and error on those statements.

3 Responses to “MySQL GROUP BY and DISTINCT Oddity”

  • Gary says:

    Interestingly, Oracle applies some smarts too.
    The following works fine:
    select distinct substr(privilege,1,2) from user_tab_privs_recd
    order by substr(privilege,1,2);
    but this fails:
    select distinct substr(privilege,1,2) from user_tab_privs_recd
    order by substr(privilege,1,3);
    Oracle actually works out whether it can derive the order by expression from the selected expression, rather than simply ruling out anything that isn’t actually selected.

  • Gary says:

    Whoops. Meant to say
    THIS one works:
    select distinct privilege from user_tab_privs_recd order by substr(privilege,1,3);
    This one doesn’t
    select distinct substr(privilege,1,2) from user_tab_privs_recd
    order by substr(privilege,1,1)
    so there are limits to its smarts.

  • Explanation is simple. Oracle isn’t smart but just follows the rules. The rule in this case – expression in order by must consist of selected columns/expressions. This should work I believe:

    select distinct substr(privilege,1,2)
    from user_tab_privs_recd
    order by substr(substr(privilege,1,2),1,1)

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>