1.617.682.4508

Pythian Blog

The world discusses #Pythian on Twitter. Have a question? Use our hashtag and ask away.

Emergency

24x7 Support

Not a Pythian client but need help now? No problem. Click here.

Are you aware of an existing DBA opening or consulting requirement in your organization? Enter your email for a chance to win one year's access to Safari Books.

  

MySQL GROUP BY and DISTINCT Oddity

By: Alex Gorbachev

At the end of my last MySQL post I mentioned strange behavior with GROUP BY and DISTINCT. In Oracle such constructions are not valid and produce an error.

Here is the test table and statements:

create table t1 (c1 int, c2 int);

insert into t1 values (1,10);
insert into t1 values (2,20);
insert into t1 values (3,30);
insert into t1 values (4,11);
insert into t1 values (4,40);
insert into t1 values (5,50);
insert into t1 values (5,12);

select c1,count(*) cnt from t1 group by c1 order by c2;

select distinct c1 from t1 order by c2;


The following two statements generate errors in Oracle for an obvious reason. In the first statement, ORDER BY clause can contain only GROUP BY expressions or aggregate functions so that the values are deterministic for each grouped row.

SQL> select c1,count(*) cnt from t1 group by c1 order by c2;
select c1,count(*) cnt from t1 group by c1 order by c2
                                                    *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

DISTINCT can be considered as a special case of GROUP BY (SELECT DISTINCT c1 from t1 is the same as SELECT c1 from t1 GROUP BY c1) and the same restrictions apply so GROUP BY clause of SELECT DISTINCT can only include expressions or columns already selected.

SQL> select distinct c1 from t1 order by c2;
select distinct c1 from t1 order by c2
                                    *
ERROR at line 1:
ORA-01791: not a SELECTed expression

Unlike Oracle, MySQL allows such combinations. Let’s see how our first statement is working in MySQL. Where in the sort output should MySQL put rows with column c1 4 and 5? Possible locations are between 1 and 2 or after 3.

mysql> select c1,count(*) cnt from t1 group by c1 order by c2;
+------+-----+
| c1   | cnt |
+------+-----+
|    1 |   1 |
|    4 |   2 |
|    2 |   1 |
|    3 |   1 |
|    5 |   2 |
+------+-----+
5 rows in set (0.03 sec)

It turned out that MySQL engine takes for с2 column the first value it comes across. It’s easy to see by adding c2 in select list:

mysql> select c1,count(*) cnt, c2 from t1 group by c1 order by c2;
+------+-----+------+
| c1   | cnt | c2   |
+------+-----+------+
|    1 |   1 |   10 |
|    4 |   2 |   11 |
|    2 |   1 |   20 |
|    3 |   1 |   30 |
|    5 |   2 |   50 |
+------+-----+------+
5 rows in set (0.00 sec)

I remember a while ago I used Microsoft Access and there was an aggregate function first() returning the first value it comes across. It looks that MySQL implicitly does the same to the columns not in GROUP BY list. However, there is no such function in the list of MySQL aggregates.

Statement with DISTINCT behaves similarly.

mysql> select distinct c1 from t1 order by c2;
+------+
| c1   |
+------+
|    1 |
|    4 |
|    2 |
|    3 |
|    5 |
+------+
5 rows in set (0.00 sec)

This MySQL “feature” could save some resources on sorts and aggregates but generally I would avoid it as this is not portable solution. Generally speaking, query output is non-deterministic — it depends on the full table scan implementation and on the physical order of rows in a table. This means that it’s actually a bug and , instead, MySQL should produce and error on those statements.

3 Responses

  1. Gary says:

    Interestingly, Oracle applies some smarts too.
    The following works fine:
    select distinct substr(privilege,1,2) from user_tab_privs_recd
    order by substr(privilege,1,2);
    but this fails:
    select distinct substr(privilege,1,2) from user_tab_privs_recd
    order by substr(privilege,1,3);
    Oracle actually works out whether it can derive the order by expression from the selected expression, rather than simply ruling out anything that isn’t actually selected.

  2. Gary says:

    Whoops. Meant to say
    THIS one works:
    select distinct privilege from user_tab_privs_recd order by substr(privilege,1,3);
    This one doesn’t
    select distinct substr(privilege,1,2) from user_tab_privs_recd
    order by substr(privilege,1,1)
    so there are limits to its smarts.

  3. Explanation is simple. Oracle isn’t smart but just follows the rules. The rule in this case - expression in order by must consist of selected columns/expressions. This should work I believe:

    select distinct substr(privilege,1,2)
    from user_tab_privs_recd
    order by substr(substr(privilege,1,2),1,1)

Leave a Reply

Start NowWith Pythian - database design, management and emergency handling capabilities...

Pythian Blog

Connecting to Oracle with SQL Server 2005 x64
The quirks of connecting to Oracle from SQL 2005 64
more



Live Updates

pythian: Pythian is now official members of the Microsoft Partner Program. Thanks Peter
more



RSSTestimonials

  • Casey Dyke

    Database Team Manager Service Delivery and Applications , Telstra

    Pythian were recently engaged to take a lead role in a high end infrastructure build project at Telstra. Our requirements were a combination of... more