Bad SQL or MySQL Bug?

Apr 2, 2008 / By Sheeri Cabral

Tags: ,

One of my colleagues made a typo in a query today that led to me discovering this issue — I think it’s a bug, though it may just be how the language is defined. But certainly a subquery that cannot run should not act as if it returns TRUE.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table foo (fid tinyint unsigned not null primary key); create table bar (bid char(1)
not null default ”);
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> insert into foo (fid) VALUES (1),(2); insert into bar (bid) VALUES (‘1′),(‘a’);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> explain select * from foo where fid in (select fid from bar)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: foo
type: index
possible_keys: NULL
key: PRIMARY
key_len: 1
ref: NULL
rows: 2
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: bar
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2
Extra: Using where
2 rows in set (0.00 sec)

mysql> select * from foo where fid in (select fid from bar);
+—–+
| fid |
+—–+
| 1 |
| 2 |
+—–+
2 rows in set (0.00 sec)

mysql> select fid from bar;
ERROR 1054 (42S22): Unknown column ‘fid’ in ‘field list’
mysql>

Part of me thinks there’s an implicit join going on or something, but I’m not sure how that’s working.

So, should select * from foo where fid in (select fid from bar); work? If so, why is that?

13 Responses to “Bad SQL or MySQL Bug?”

  • Bill Karwin says:

    Think about the query if you put correlation names on each table.

    Force fid in the subquery to use the outer correlation name:

    select f.* from foo f where f.fid in (select f.fid from bar b);

    Force fid in the subuqery to use the inner correlation name (ERROR because b.fid doesn’t exist):

    select f.* from foo f where f.fid in (select b.fid from bar b);

    Force the subquery to use b.bid instead of b.fid:

    select f.* from foo f where f.fid in (select b.bid from bar b);

  • Ken Jacobs says:

    Yes, it should work. The “fid” reference in the subquery is in fact a reference to the value of fid of the outer query. The notion is that this is a “correlated subquery”, and this feature enables you to execute the inner query for each row of the outer query result, and even compare values to the values of columns in the outer query … for example:

    select * from foo where exists
    (select * from bar where bid > fid)

    You could, and probably should always qualify the name of the tables with “aliases”, as in

    select * from foo f where exists
    (select * from bar b where b.bid > f.fid)

    Hope that helps!

  • Garrett says:

    There is no bug here, this is the proper result from this query. The value of fid is used for each row in bar. Thus the effect of the query is that sub-query will return 2 rows with the same value as the current fid, for each fid. This was covered in the first day of class..

    [Response: Then Garret, why didn’t the query return 4 rows then? There are 2 rows in foo and 2 rows in bar, by your reasoning I should get the result set:

    (1,2,1,2)

    I did not do a SELECT DISTINCT…..]

  • I cannot discover any logical error. Obviously, the query is most likely wrong semantically.

    The short explanation is that
    1) fid is resolvable in the outer query – i.o.w it exists already at the topmost level
    2) If fid exists at the topmost level, it exists also at the levels beneath that unless masked by a nearer scope
    3) the subquery on bar does not define a fid, so fid unambigously points to the fid of the outermost query.
    4) the subquery on bar may reference fid from the outermost query.

    That says it all.

    As a slightly longer explanation, would you have been surprised that this works:

    select fid
    from foo
    where exists (
    select null
    from bar
    where bid = fid
    )

    Now if you accept this, what about this:

    select fid
    from foo
    where exists (
    select fid
    from bar
    where bid = fid
    )

    Logically this should not be objectionable – if the WHERE clause can refer to fid, why not the SELECT list?

    So if we accept this, then should be able to except that if

    select fid
    from bar
    where bid = fid

    executes *within the current context* then surely,

    select fid
    from bar

    must execute too within that same context.

    Your last query does not include that context though – fid is not defined when this is used as the outermost query – hence, you get the “unknown column” error.

    You say first “certainly a subquery that cannot run” but the whole point is that *because it is a subquery* it can reference items defined elsewhere (in some enclosing scope). So it can run if it is embedded within its particular context. It may of course not be able to run when you strip away the context.

  • Oh, I forgot to close the circle:

    “So, should select * from foo where fid in (select fid from bar); work? If so, why is that?”

    yes, it should work as the subquery always selects the exact same value for fid as present for the current row in the outer query. So, both occurrences of fid refer to the exact same value and thus, by definition fid in (…fid…) is always true.

    Roland.

  • muzazzi says:

    Of course it should work. :) This is basically saying “Give me all the fids if there exists at least one row in bar”. The subquery returns the current ‘fid’ for every row in bar, of which the current fid will always be a member. I might note that it’s the same behavior in Postgres as well. Namespace works from inside out, fid isn’t in the scope of the subquery, so it becomes dependent on the parent query. If you explicitly defined the namespace as in:
    select * from foo where fid in (select foo.fid from bar)

    would you still not expect it to work?

  • kimseong says:

    It should be treated as a dependent subquery with fid referring to the table on the outer query.
    If table bar is empty, then there should not be any result.

  • Gary says:

    select * from foo where fid in (select fid from bar);
    should work because you also want this (and similar) to work….
    select * from foo where 1 = (select count(*) from bar where fid=bid);
    so fid and bid are both ‘in scope’ for the sub-query.
    what shouldn’t work is
    select * from foo where fid in (select bar.fid from bar);

  • I would say it’s a bug…

    My guess is, the optimizer is evaluating fid as a string, thus returning results:

    mysql> select * from foo where ‘fid’ in (select ‘fid’ from bar);
    +—–+
    | fid |
    +—–+
    | 1 |
    | 2 |
    +—–+
    2 rows in set (0.00 sec)

    You should enter a bug for it I think…

  • I think the issue is that fid in implies that fid should match the values from the subselect:

    mysql> select * from foo where fid in (select 90 from bar);
    Empty set (0.00 sec)

    I’m in agreement with Gary and kimseong

  • Sheeri Cabral says:

    *nod* The EXPLAIN was more than enough for me to see that it was a dependent subquery. And empirically, I understood that it was the equivalent of

    select fid from foo where fid in (select true);

    I just wasn’t sure how the optimizer got to that point — Roland’s comments helped the most for that.

    Basically,
    select fid from foo where fid in (select fid from bar)
    turns into
    select fid from foo where fid in (select 1 from bar)
    select fid from foo where fid in (select 2 from bar)

    (because 1 and 2 are the values of fid).

  • Ryan Lowe says:

    There is no guessing here:

    mysql> explain extended select * from foo where fid in (select fid from bar);
    +—-+——————–+——-+——-+—————+———+———+——+——+————————–+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+——————–+——-+——-+—————+———+———+——+——+————————–+
    | 1 | PRIMARY | foo | index | NULL | PRIMARY | 1 | NULL | 2 | Using where; Using index |
    | 2 | DEPENDENT SUBQUERY | bar | ALL | NULL | NULL | NULL | NULL | 2 | Using where |
    +—-+——————–+——-+——-+—————+———+———+——+——+————————–+
    2 rows in set, 2 warnings (0.00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
    Level: Note
    Code: 1276
    Message: Field or reference ‘test.foo.fid’ of SELECT #2 was resolved in SELECT #1
    *************************** 2. row ***************************
    Level: Note
    Code: 1003
    Message: select `test`.`foo`.`fid` AS `fid` from `test`.`foo` where (`test`.`foo`.`fid`,(select 1 AS `Not_used` from `test`.`bar` where ((`test`.`foo`.`fid`) = `test`.`foo`.`fid`)))
    2 rows in set (0.00 sec)

    mysql>

  • paulm says:

    Interesting that the () do nothing to determine the scope of the SQL.

    I ran this on Oracle as well and got the same result.

    You might as well have written select fid from foo. So the SQL is wrong in that sense and the IN subquery is redundant.

    Interesting enough… if this is rewritten as a join

    select fid from foo join bar on fid = fid;
    it results in a cartesian join which returns all rows in both tables.

    Have Fun

    Paul

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>