How to tell when using INFORMATION_SCHEMA might crash your database

Feb 16, 2010 / By Sheeri Cabral

Tags: ,

There are those that are very adamant about letting people know that using INFORMATION_SCHEMA can crash your database. For example, in making changes to many tables at once Baron writes:

“querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous.”

Though Baron is telling the truth here, he left out one extremely important piece of information: you can actually figure out how dangerous your INFORMATION_SCHEMA query will be, ahead of time, using EXPLAIN.


In MySQL 5.1.21 and higher, not only were optimizations made to the INFORMATION_SCHEMA, but new values were added so that EXPLAIN had better visibility into what MySQL is actually doing. As per http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html there are 6 new “Extra” values for EXPLAIN that are used only for INFORMATION_SCHEMA queries.

The first 2 “Extra” values for EXPLAIN are mostly self-explanatory:
Scanned 1 database – Only one database directory needs to be scanned.
Scanned all databases – All database directories are scanned. This is more dangerous than only scanning one database.

Note that there is no middle ground — there is no optimization to only scan 2 databases; either all database directories are scanned, or only one is. If your query spans more than one database, then all database directories are scanned. Note that this

SHOW statements are less dangerous than using INFORMATION_SCHEMA because they only use one database at a time. If you have an INFORMATION_SCHEMA query that produces an “Extra” value of “Scanned 1 database”, it is just as safe as a SHOW statement.

The optimizations went even further, though. From the most “dangerous” — ie, resource intensive — to the least, here are the other 4 “Extra” values introduced in MySQL 5.1.21 (which, for the record, came out in August 2007, so it is a feature that has been around for 2.5 years at this point):

Open_full_table
Open_trigger_only
Open_frm_only
Skip_open_table

A bit more explanation, and some examples:

Open_full_table – Needs to open all the metadata, including the tables format file (.frm) and data/index files such as .MYD and .MYI. The previously linked to manual page about the optimization includes which information will show each “Extra” type — for example, the AUTO_INCREMENT and DATA_LENGTH fields of the TABLES table require opening all the metadata.

mysql> EXPLAIN SELECT TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT FROM TABLES\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Open_full_table; Scanned all databases
1 row in set (0.00 sec)

Let’s see an example that only scans 1 database:

mysql> EXPLAIN TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'\G
ERROR 1109 (42S02): Unknown table 'TABLE_NAME' in information_schema
mysql> EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_full_table; Scanned 1 database
1 row in set (0.00 sec)

Note that “Scanned all databases” will apply if there is any way there could be more than one database. For example, on my test server, only the ‘test’ and ‘sakila’ databases exist (other than ‘mysql’ and ‘INFORMATION_SCHEMA’ of course) and yet when I do

EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA LIKE 'test%'\G

I still get “Scanned all databases”. So be careful.

One of the basic pieces of advice I see to optimize queries can be applied to queries on the INFORMATION_SCHEMA — Do not use SELECT * unless you actually want to get every single piece of information. In the case of INFORMATION_SCHEMA, optimizing your queries can mean the difference between the server crashing and the server staying up.

Open_trigger_only – Only the .TRG file needs to be opened. Interestingly enough, this does not seem to have an example that applies. The manual page says that the TRIGGERS table uses Open_full_table for all fields. When I tested it, though, I did not get anything in the “Extra” field at all — not “Open_trigger_only” and not even “Open_full_table”:

mysql> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 5.1.37-1ubuntu5-log |
+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM TRIGGERS\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TRIGGERS
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:
1 row in set (0.00 sec)

Open_frm_only – Only the format file (.frm) of the table needs to be open. Again, check the manual page for the fields that can use this optimization — fields such as CREATE_OPTIONS and ENGINE in the TABLES table do, for example.

Skip_open_table – This is the last new “Extra” feature, and it is the best. This optimization type means that no files need to be opened. The database directories are scanned and information can be obtained — mostly the table name, so when querying only the TABLE_NAME and TABLE_SCHEMA fields from the TABLES table, your query is safe.

So instead of putting your head in the sand and never using the great tool that is the INFORMATION_SCHEMA, first EXPLAIN your query to see if it will work or not.

(Note, if you are still on MySQL 5.0, what are you waiting for? The upgrade to MySQL 5.1 is relatively painless, and Pythian has a comprehensive checklist for how to upgrade while keeping your sanity).

6 Responses to “How to tell when using INFORMATION_SCHEMA might crash your database”

  • Tom Hanlon says:

    Sheeri,

    Thanks for the information, I never thought to run explain against an information_schema query.

    In class I use queries against information_schema to show issues around opening tables, and the table_open_cache and table_definition_cache.

    Hitting some columns in the information_schema tables open tables, and/or table_definition.

    This option gives me new tools to play with.

    Thanks.

    Now.. if what happens if we profile a query against information_schema.. Does the server crash.. or do we learn something interesting ? Or both ?


    Tom Hanlon

  • Hi!

    great post – didn’t know this existed. Unfortunately, it seems my favorite trick to restrict results to the current schema, the SCHEMA() function, doesn’t trigger optimization.

    (I mean, SELECT table_name FROM information_schema.TABLES WHERE TABLE_SCHEMA = SCHEMA() will scan all dirs, whereas the equivalent using a string constant instead of SCHEMA() only scans one dir)

  • Mm, some further testing shows that SCHEMA() actually does work, but only if the default database is set:

    mysql> select schema();
    +----------+
    | schema() |
    +----------+
    | NULL     |
    +----------+
    1 row in set (0.00 sec)
    
    mysql> explain select * from information_schema.tables where table_schema = schema();
    +----+-------------+--------+------+---------------+------+---------+------+------+-----------------------------------------------------+
    | id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
    +----+-------------+--------+------+---------------+------+---------+------+------+-----------------------------------------------------+
    |  1 | SIMPLE      | tables | ALL  | NULL          | NULL | NULL    | NULL | NULL | Using where; Skip_open_table; Scanned all databases |
    +----+-------------+--------+------+---------------+------+---------+------+------+-----------------------------------------------------+
    1 row in set (1.95 sec)
    
    mysql> use test;
    Database changed
    mysql> select schema();
    +----------+
    | schema() |
    +----------+
    | test     |
    +----------+
    1 row in set (0.00 sec)
    
    mysql> explain select * from information_schema.tables where table_schema = schema();
    +----+-------------+--------+------+---------------+--------------+---------+------+------+--------------------------------------------------+
    | id | select_type | table  | type | possible_keys | key          | key_len | ref  | rows | Extra                                            |
    +----+-------------+--------+------+---------------+--------------+---------+------+------+--------------------------------------------------+
    |  1 | SIMPLE      | tables | ALL  | NULL          | TABLE_SCHEMA | NULL    | NULL | NULL | Using where; Open_full_table; Scanned 1 database |
    +----+-------------+--------+------+---------------+--------------+---------+------+------+--------------------------------------------------+
    1 row in set (0.00 sec)
    
  • For those of us who write tools such as Maatkit that will run on a variety of server versions, the fixes in 5.1 are too little, too late. INFORMATION_SCHEMA is forever banned for Maatkit. And the same is true for me as a human — I will always use SHOW commands.

  • Shlomi Noach says:

    Sheeri, great info.
    I’m using the comments here to present with the following experience:
    SELECTing from TABLES, COLUMNS with a given schema name (i.e. `sakila`), does indeed make for a “Scanned 1 database”.

    SELECTing from STATISTICS (indexes table) — does NOT, sadly:

    explain select * from information_schema.statistics where index_schema=’sakila’\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: statistics
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: NULL
    Extra: Using where; Open_full_table; Scanned all databases

    Which is strange, since I would put them on the same level.

    With REFERENTIAL_CONSTRAINTS or KEY_COLUMN_USAGE, this does not work, as well. Here, I may reason that FK constraints may refer to tables in two different schemata; so this makes more sense.

  • Shlomi Noach says:

    Hmmmm…
    Prematurely jumped to conclusions. When I do:

    explain select * from information_schema.statistics where table_schema=’sakila’\G

    I DO get “Scanned 1 database”.

    Need to be careful about the right schema column!

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>