Three Clever MySQL Diagnostic Tools

Dec 15, 2008 / By Gerry Narvaja

Tags:

Last week I had to confront one of those situations where you can’t really tell what is going on with a piece of software, and the final conclusion would sound completely crazy if postulated as the initial hypothesis. The regular MySQL commands and utilities fall short in these cases, so I had to resort to the three tools reviewed in this article.

The problem we were diagnosing was this: at some point in time, a number of queries that use to take less than one minute to execute, started to take between five to 15 minutes. We needed to get an insight into what was going on inside the MySQL server.

MySQL Tuner

At some point in a long diagnosis process, MySQL’s SHOW [GLOBAL] VARIABLES and SHOW [GLOBAL] STATUS are nothing more than a long list of numbers. Going through a team mate’s notes on another issue, I came across MySQL Tuner. This is an extremely simple tool that takes the information from the SHOW statements mentioned above and combines it in a useful way. Sample run:

 >>  MySQLTuner 1.0.0 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
[!!] Successfully authenticated with no password - SECURITY RISK!

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.51a-3ubuntu5.4
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in InnoDB tables: 16K (Tables: 1)
[OK] Total fragmented tables: 0

-------- Performance Metrics -------------------------------------------------
[--] Up for: 9h 0m 49s (70 q [0.002 qps], 13 conn, TX: 24K, RX: 2K)
[--] Reads / Writes: 100% / 0%
[--] Total buffers: 58.0M global + 2.6M per thread (100 max threads)
[OK] Maximum possible memory usage: 320.5M (18% of installed RAM)
[OK] Slow queries: 0% (0/70)
[OK] Highest usage of available connections: 1% (1/100)
[OK] Key buffer size / total MyISAM indexes: 16.0M/48.0K
[!!] Query cache efficiency: 0.0% (0 cached / 7 selects)
[OK] Query cache prunes per day: 0
[OK] Temporary tables created on disk: 0% (0 on disk / 8 total)
[OK] Thread cache hit rate: 92% (1 created / 13 connections)
[OK] Table cache hit rate: 86% (40 open / 46 opened)
[OK] Open file limit used: 7% (78/1K)
[OK] Table locks acquired immediately: 100% (53 immediate / 53 locks)
[OK] InnoDB data size / buffer pool: 16.0K/8.0M

-------- Recommendations -----------------------------------------------------
General recommendations:
    MySQL started within last 24 hours - recommendations may be inaccurate
    Enable the slow query log to troubleshoot bad queries
Variables to adjust:
    query_cache_limit (> 1M, or use smaller result sets)

The beauty and usefulness of this tool is its simplicity. In the case at hand, based on the report from this tool, we were able to immediately discard the temporary tables and the values for the different buffers and caches as the source of our issues.

Maatkit’s Query Profiler

This tool is part of the Maatkit toolset and it provides timing information for a SQL script file. This gives a lot of flexibility on the type of tests you can perform. For example, running the same set of queries with different buffer sizes. Sample run:

./mk-query-profiler.pl --database data test.sql

+----------------------------------------------------------+
|                      2 (0.7788 sec)                      |
+----------------------------------------------------------+

__ Overall stats _______________________ Value _____________
   Total elapsed time                        1.178
   Questions                                19
     COMMIT                                  0
     DELETE                                  0
     DELETE MULTI                            0
     INSERT                                  0
     INSERT SELECT                           0
     REPLACE                                 0
     REPLACE SELECT                          0
     SELECT                                  2
     UPDATE                                  0
     UPDATE MULTI                            0
   Data into server                       1648
   Data out of server                     3808
   Optimizer cost                       164373.559

__ Table and index accesses ____________ Value _____________
   Table locks acquired                     40
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                          156976
     Range sorts                             2
     Merge passes                            6
     Table scans                             0
     Potential filesorts                     0

Looking at queries executing on the data before and after the incident, we could see that the number of table scans and potential filesorts were different. This was the first clue that the problem was in the queries’ execution plan. We prepared different scripts with different SELECT alternatives and were able to quickly compare the changes in these numbers.

Maatkit’s Visual Explain

Another tool from the Maatkit’s set. This simply represents the EXPLAIN output as a tree using regular printable characters. Sample output:

./mk-visual-explain.pl --database data -c test_expl.sql
JOIN
+- Bookmark lookup
|  +- Table
|  |  table          table1
|  |  possible_keys  PRIMARY
|  +- Unique index lookup
|     key            table1->PRIMARY
|     possible_keys  PRIMARY
|     key_len        4
|     ref            data.table0.CID
|     rows           1
+- JOIN
   +- Unique index lookup
   |  key            table5->PRIMARY
   |  possible_keys  PRIMARY
   |  key_len        4
   |  ref            data.table0.STID
   |  rows           1
   +- JOIN
      +- Filter with WHERE
      |  +- Bookmark lookup
      |     +- Table
      |     |  table          table4
      |     |  possible_keys  PRIMARY,Index_5,Index_6
      |     +- Unique index lookup
      |        key            table4->PRIMARY
      |        possible_keys  PRIMARY,Index_5,Index_6
      |        key_len        8
      |        ref            data.table0.OID,data.table0.CID
      |        rows           1
      +- JOIN
         +- Filter with WHERE
         |  +- Bookmark lookup
         |     +- Table
         |     |  table          table0
         |     |  possible_keys  PRIMARY,OID_idx
         |     +- Index lookup
         |        key            table0->OID_idx
         |        possible_keys  PRIMARY,OID_idx
         |        key_len        4
         |        ref            data.forpeop2_.OPID
         |        rows           1
         +- JOIN
            +- Index lookup
            |  key            table2->PRIMARY
            |  possible_keys  PRIMARY,Index_2,Index_3
            |  key_len        4
            |  ref            data.table3.FPID
            |  rows           1
            +- Filesort
               +- Filter with WHERE
                  +- Index lookup
                     key            table3->Index_5
                     possible_keys  PRIMARY,Index_5
                     key_len        1
                     ref            const
                     rows           36816

In our case, this output clearly provided two pieces of information. For the slow servers it showed a temporary table at the very beginning of the execution plan. This discrepancy between the servers explained the timing differences.

Looking at the good plan, we were able to rewrite the query, hinting the optimizer to use the proper indexes. Running this modified SELECT through both the Query Profiler and the Visual Explainer on the slow servers, we were able to corroborate that the problem was in the optimizer and not the data itself (mysqlcheck didn’t reveal any corruption either).

Conclusion

None of these three tools show hidden information or use obscure algorithms (all three are Perl scripts), they just represent the existing data in a clever format. When I am looking at complex information or subtle differences in the data, these kind of tools are the ones that make my life easier.

Appendix: Note on the Actual Problem

As noted above, we found an issue in the MySQL optimizer. In about 8M worth of data (out of 170M), the internal MySQL statistics were changed enough to make the optimizer execute a bad plan. We ran ANALYZE TABLE and OPTIMIZE TABLE without any positive results. Unfortunately we can’t file a bug report because the data and database structure are proprietary. At the time of incident, the server in use was version 5.0.17.

3 Responses to “Three Clever MySQL Diagnostic Tools”

  • Arjen Lentz says:

    Any 5.0.xx lower than 32 should be considered highly dangerous, and not suitable for production use. Lots of nasty bugs everywhere, including with InnoDB, the optimiser, and replication. Hmm, that’s about all the key bits you really need to work well ;-)

    Sometimes old(er) versions are fine, sometimes they are not. IMHO 5.0.17 is a case of not.

  • The Developer Day » Blog Archive » Advice to MySQL, Secure file uploads & more says:

    […] applications at once it is vital to ensure that everything is working fine. Found a nice post about MySQL diagnostic tools. Should check them out on our production servers. Seems […]

  • Although the files were initially under 5.0.17, one of the first steps we took was to upgrade to 5.0.67. The ANALYZE and OPTIMIZE commands were run under this release. So maybe the damage done before couldn’t be fixed or the bug is still present in 5.0.67.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>