MySQL GROUP BY and DISTINCT Oddity

Posted in: MySQL, Technical Track

At the end of my last MySQL post I mentioned strange behavior with GROUP BY and DISTINCT. In Oracle such constructions are not valid and produce an error.

Here is the test table and statements:

create table t1 (c1 int, c2 int);

insert into t1 values (1,10);
insert into t1 values (2,20);
insert into t1 values (3,30);
insert into t1 values (4,11);
insert into t1 values (4,40);
insert into t1 values (5,50);
insert into t1 values (5,12);

select c1,count(*) cnt from t1 group by c1 order by c2;

select distinct c1 from t1 order by c2;


The following two statements generate errors in Oracle for an obvious reason. In the first statement, ORDER BY clause can contain only GROUP BY expressions or aggregate functions so that the values are deterministic for each grouped row.

SQL> select c1,count(*) cnt from t1 group by c1 order by c2;
select c1,count(*) cnt from t1 group by c1 order by c2
                                                    *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

DISTINCT can be considered as a special case of GROUP BY (SELECT DISTINCT c1 from t1 is the same as SELECT c1 from t1 GROUP BY c1) and the same restrictions apply so GROUP BY clause of SELECT DISTINCT can only include expressions or columns already selected.

SQL> select distinct c1 from t1 order by c2;
select distinct c1 from t1 order by c2
                                    *
ERROR at line 1:
ORA-01791: not a SELECTed expression

Unlike Oracle, MySQL allows such combinations. Let’s see how our first statement is working in MySQL. Where in the sort output should MySQL put rows with column c1 4 and 5? Possible locations are between 1 and 2 or after 3.

mysql> select c1,count(*) cnt from t1 group by c1 order by c2;
+------+-----+
| c1   | cnt |
+------+-----+
|    1 |   1 |
|    4 |   2 |
|    2 |   1 |
|    3 |   1 |
|    5 |   2 |
+------+-----+
5 rows in set (0.03 sec)

It turned out that MySQL engine takes for с2 column the first value it comes across. It’s easy to see by adding c2 in select list:

mysql> select c1,count(*) cnt, c2 from t1 group by c1 order by c2;
+------+-----+------+
| c1   | cnt | c2   |
+------+-----+------+
|    1 |   1 |   10 |
|    4 |   2 |   11 |
|    2 |   1 |   20 |
|    3 |   1 |   30 |
|    5 |   2 |   50 |
+------+-----+------+
5 rows in set (0.00 sec)

I remember a while ago I used Microsoft Access and there was an aggregate function first() returning the first value it comes across. It looks that MySQL implicitly does the same to the columns not in GROUP BY list. However, there is no such function in the list of MySQL aggregates.

Statement with DISTINCT behaves similarly.

mysql> select distinct c1 from t1 order by c2;
+------+
| c1   |
+------+
|    1 |
|    4 |
|    2 |
|    3 |
|    5 |
+------+
5 rows in set (0.00 sec)

This MySQL “feature” could save some resources on sorts and aggregates but generally I would avoid it as this is not portable solution. Generally speaking, query output is non-deterministic — it depends on the full table scan implementation and on the physical order of rows in a table. This means that it’s actually a bug and , instead, MySQL should produce and error on those statements.

email

Interested in working with Alex? Schedule a tech call.

About the Author

What does it take to be chief technology officer at a company of technology experts? Experience. Imagination. Passion. Alex Gorbachev has all three. He’s played a key role in taking the company global, having set up Pythian’s Asia Pacific operations. Today, the CTO office is an incubator of new services and technologies – a mini-startup inside Pythian. Most recently, Alex built a Big Data Engineering services team and established a Data Science practice. Highly sought after for his deep expertise and interest in emerging trends, Alex routinely speaks at industry events as a member of the OakTable.

3 Comments. Leave new

Interestingly, Oracle applies some smarts too.
The following works fine:
select distinct substr(privilege,1,2) from user_tab_privs_recd
order by substr(privilege,1,2);
but this fails:
select distinct substr(privilege,1,2) from user_tab_privs_recd
order by substr(privilege,1,3);
Oracle actually works out whether it can derive the order by expression from the selected expression, rather than simply ruling out anything that isn’t actually selected.

Reply

Whoops. Meant to say
THIS one works:
select distinct privilege from user_tab_privs_recd order by substr(privilege,1,3);
This one doesn’t
select distinct substr(privilege,1,2) from user_tab_privs_recd
order by substr(privilege,1,1)
so there are limits to its smarts.

Reply
Alex Gorbachev
January 22, 2007 9:21 am

Explanation is simple. Oracle isn’t smart but just follows the rules. The rule in this case – expression in order by must consist of selected columns/expressions. This should work I believe:

select distinct substr(privilege,1,2)
from user_tab_privs_recd
order by substr(substr(privilege,1,2),1,1)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *