Is the query cache useful?

Sep 28, 2009 / By Sheeri Cabral

Tags:

Mark Callaghan posted a good test of the MySQL query cache in different versions. His tests clearly show that in 5.0.44 and 5.0.84 and 5.1.38, there is more query throughput when the query cache is disabled.

However, the tests are skewed — not on purpose, I am sure, and Mark admits he has not used the query cache before — but they are skewed all the same. Mark’s error was that he assumed he could just turn on the query cache and see if it works. Most features of MySQL do not work that way — you have to understand the strengths and weaknesses of the feature in order to use it properly.

Mark’s benchmark definitely reinforces that turning on the query cache without any knowledge of your system is a bad idea, and I agree with him on that. But it does not in any way mean that the query cache is always a bad idea. In fact, the MySQL manual page at http://dev.mysql.com/doc/refman/5.1/en/query-cache.html has some caveats about when the query cache is useful and when it is not useful.

It is important to know how the MySQL query cache works, so I will first explain that, and then explain why Mark’s test is not a very good broad generalization. MySQL’s query cache is not useful all the time, but it can be quite useful! If you are already familiar with the query cache, you can skip ahead to why Mark’s benchmark is skewed.


How to use the MySQL query cache
Firstly, there are a few ways to use the MySQL query cache. You can disable it, which means no queries are cached; you can set it to cache all queries, and use a special keyword to not cache queries (replace SELECT with SELECT SQL_NO_CACHE). query_cache_type=1 or ON will set the query cache in this way.

You can also set it to cache no queries except those that have a special keyword — you guessed it, replace SELECT with SELECT SQL_CACHE and a query will attempt to be cached. To set the query cache in this way, set query_cache_type=2 or DEMAND.

What queries are cached
Regardless of which method you choose, not all possible queries are cached. Only deterministic SELECT statements are cached — a statement such as SELECT count(*) from tbl is deterministic, while SELECT NOW() is non-deterministic for the purpose of the query cache.* Queries that use user-defined functions, temporary tables, LOCK IN SHARE MODE, FOR UPDATE, INFORMATION_SCHEMA or mysql system tables are not cached. Most SELECT subqueries are not cached. SELECT queries that are part of stored functions, triggers, views and events are not cached. However, in MySQL 5.1, SELECT queries that contain a view can be cached. The MySQL manual page at http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html has more detailed information on what is allowed and what is not allowed.

In the query cache, a hash of the query is stored along with the result. There is a variable, query_cache_limit, which sets an upper limit on the size of the result set that can be cached. So even if a query can be cached, it may not be cached if the result set is larger than query_cache_limit.

How the query cache is used
The lifecycle of a non-prepared SELECT statement when the query cache is on (query_cache_type > 0) is:

1) hash the SELECT query
2) check the hash against the query cache
3) if hash matches, returned the result set from the query cache. If not, the query is sent to the parser and executed in the usual manner, and results are stored in the query cache if the query is cachable.

This means that having the query cache on has some overhead, even if no queries are ever cached.

A query (and its result) is removed from the query cache when it is invalidated or pruned. A query is invalidated when any table it refers to has been changed, even if the changes do not affect the actual result set in the query cache. A query is pruned when a new query needs to be inserted into the query cache and there is no room; the query cache prunes queries using a least recently used (LRU) algorithm.

In other words, there will be a lot of unnecessary overhead with the query cache if a table is often written to, because queries using that table will be constantly removed from the query cache. This does not mean that if you have some tables that are frequently written to, you should turn the query cache off — you can use the query_cache_type of ON with the SQL_NO_CACHE keyword to cache all except some queries, or you can use the query_cache_type of DEMAND and the SQL_CACHE keyword in SELECT queries to cache no queries except those you specify.

Why Mark’s Benchmark is skewed

It is important to note that InnoDB was set up with:
innodb_buffer_pool_size=2000M

and the sysbench program was run with:

../sysbench --test=oltp --oltp-table-size=2000000 --max-time=180 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-read-only --num-threads=$nt run

I will explain what the important sysbench variables are in context.

1) The benchmark used sysbench in OLTP mode against an InnoDB table. The InnoDB buffer pool is hot after the table is created and populated, while the query cache is empty and cold.

InnoDB has its own cache — the InnoDB buffer pool. sysbench is run with –oltp-read-only, meaning there are no INSERT, DELETE or UPDATE statements. This is good; it means that the results are not skewed by query cache invalidations, though there may be pruning.

The first thing sysbench does is preparation — it creates the table and populates it with data (2,000,000 rows as per oltp-table-size). After this is done, the InnoDB buffer pool (2000M in size) has gone from empty to containing data. I do not know how large the row size of the sysbench table is; however, I can do some calculations:

2000 Mb * 1024 = 2,048,000 Kb / 2,000,000 rows = 1.024 Kb per row

This means that in order for some data to *not* be current in the InnoDB buffer pool, the row size had to be approximately larger than 1 Kb per row.

The manual at http://sysbench.sourceforge.net/docs/#database_mode shows the schema of the table that is used.

CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL auto_increment, `k` int(10) unsigned NOT NULL default ‘0’, `c` char(120) NOT NULL default ”, `pad` char(60) NOT NULL default ”, PRIMARY KEY (`id`), KEY `k` (`k`)

(note that you at least need to add a final parenthesis “)” for the query to work, and should also specify ENGINE=INNODB if you want to test on your system).

So, that’s two 4-byte integers, and two character fields…for the sake of argument, let’s say you are using UTF-8 in MySQL 5.1, where utf8 is 3 bytes per character, that’s:

4 + 4 + 120*3 + 60*3 = 548 bytes

Let’s say you are using a version of MySQL that uses 4 bytes per character for utf8 (I know 6.0 was using that, not sure about 5.4):

4 + 4 + 120*4 + 60*4 = 728 bytes

So either way, all of the data in the table is already loaded in the InnoDB buffer pool when the test begins. The InnoDB buffer pool is already hot, whereas the query cache is empty.

On systems where a query can be cached and used instead of executing and having to look on disk for data, using the MySQL query cache will give you *much* better performance.

2) The query cache statistics are not given. It’s very possible that the only thing that was measured was how much overhead the query cache provides if it is never used.

The queries that are run are the default (oltp-test-mode is not set, and it defaults to “complex”) which can be seen under the “Advanced transactional” heading in the sysbench manual at http://sysbench.sourceforge.net/docs/#database_mode. Only read queries are done, so the queries are:

SELECT c FROM sbtest WHERE id=N
SELECT c FROM sbtest WHERE id BETWEEN N AND M
SELECT SUM(K) FROM sbtest WHERE id BETWEEN N and M
SELECT c FROM sbtest WHERE id between N and M ORDER BY c
SELECT DISTINCT c FROM sbtest WHERE id BETWEEN N and M ORDER BY c

The values for the variables in the query are chosen at random, though they are sensible values. The testing time is very short, only 3 minutes of running time (sysbench is called with max-time=180). It is quite possible that all the SELECT queries were put into the query cache and there was never a query cache hit. Of course there will be a lot of overhead if the query cache is never used! Particularly if it has to prune for more memory.

I would like to see Qcache_hits, Qcache_inserts, Com_select (number of SELECT statements that did not use the query cache) and Qcache_lowmem_prunes. That is what I look at for any system that has the query cache running, so I can determine the effectiveness of the query cache.

There is more in-depth information on how to find the usefulness of the query cache, query cache fragmentation, and a few ways to cache more queries in the MySQL Administrator’s Bible on pages 427-434 — part of the “Caching with MySQL” chapter. I used the book to help me write this blog post, so I do think it is fair to cite my sources….even though it happens to be a source I created (and that’s why I consider it a shameless plug, I have no shame citing my sources)!

—————-

So that all being said, the query cache is extremely useful when it is used for repeated queries. The benchmark Mark performed most likely did not actually use this. I have seen production environments fall apart when someone turned on on the query cache for all queries; Mark’s benchmark proves that it is not a good idea to randomly turn on the query cache. You have to know what you’re doing.

* for the purpose of replication, it is deterministic, but that’s another topic.

11 Responses to “Is the query cache useful?”

  • Eric Bergen says:

    Do you have any benchmarks that show the query cache being useful in any sane production environment?

    The problem with the query cache is that it sounds like a magical feature when it’s really deeply flawed. Most users aren’t going look at a feature that sounds as good as the query cache before turning it on. They just turn it on. Worse yet they turn it on at a time when they have less traffic and don’t realize that when traffic grows the query cache is hurting their performance.

    I think it’s better to remove a feature that hurts performance for most users even though it may improve performance for a rare few, especially when alternatives exist for that minority.

  • wrene says:

    I wonder if it would be possible to use memcached -> http://www.danga.com/memcached/
    with clustering -> http://code.google.com/p/memcached/wiki/FAQ#Cluster_Architecture_Questions

    to at some determined thingy(technical term) move the query cached queries to memcached cluster.. Of course dispose of them per thingy as needed by some mid tier thingy doing all this juggling.. I’ll bet one could get very nice performance..

    just a thought..theory.. speculation

  • Sheeri Cabral says:

    Eric — I don’t have benchmarks, but I have used it with DEMAND and SELECT queries with the SQL_CACHE keyword in the past and seen dramatic improvements — I don’t have numbers, but I have seen it work.

    I have also seen it make things worse when turned on blindly.

    I think that if we removed every feature that “most people” might turn on without looking at it because doing that causes bad performance, we’d have to turn off lots of features and remove a lot of custom SQL that is currently allowed.

    The query cache is off by default, which is as it should be.

    There are warnings on the manual page, which “most users” will actually read.

  • Sheeri Cabral says:

    Wrene — indeed, and that’s one of the things that Eric meant when he said “especially when alternatives exist”….

    There’s even a memcached storage engine.

    http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/
    states that there are some good reasons to use the query cache:

    “Third party application – You can’t change how it works with MySQL to add caching but you can enable query cache so it works faster.

    Low load applications – If you’re building application which is not designed for extreme load, like many personal application query cache might be all you need. Especially if it is mostly read only scenario. ”

    And I totally agree, with one addition: If you have the time and knowledge to use the query cache appropriately but not to use other tools (such as memcached) appropriately, then it’s the right solution. Using memcached means rewriting code, which a lot of organizations are loath to do. But definitely, it’s a HUGE win, and I recommend it over the MySQL query cache for most cases.

    I just don’t think the query cache should be removed. Perhaps tweaked a bit, so that it won’t check the query cache for every SELECT query, more insight into what’s in the query cache itself, etc.

  • Mark Callaghan says:

    I could have been much more clear.

    The point of my post was that worst-case behavior of the query cache changed significantly from 5.0.44 (bad) to 5.0.84 (better) to 5.1.3X (bad again). Whether or not the query cache is useful is a different issue. Changes in code that lead to drastic changes in performance make stable systems much harder to build.

    Are you aware of this behavior change in the query cache?

  • Sheeri Cabral says:

    Mark — I’m not aware of the behavior, but the optimizer has had some pretty wonky bugs fixed, and other bugs introduced, in that time. The query cache isn’t widely used (because it takes work and tools like memcached are better) so my lack of seeing the issue doesn’t mean it doesn’t exist in the wild….

    My point is that the “noise” from the bad query cache may not have been noticed — at least not yet. I’ve had experience going from 5.0.45 and 5.0.51 to 5.1.31 and up, and experience going to 5.0.84, but I can’t recall going from 5.0.84 to 5.1.3X — there hasn’t been a driving need to upgrade 5.0.84 clients to 5.1 yet…..So maybe I haven’t seen it (and in general it hasn’t been seen) because very few people are migrating from 5.0.84 to 5.1.3X.

    On the whole, the worst-case behavior of the query cache is unacceptable even in the “better” stage, and given the many different operations that happen in a “worst-case system” to a “real life system”, it may be that only the worst-case system is affected by whatever was changed. Hard to know, really.

    It also might be that the performance is only affected when you’re filling the query cache, ie when it’s mostly empty and you’re adding queries….if that’s the case, slowness wouldn’t be noticed because there’s slowness when mysqld restarts due to filling the innodb_buffer_pool too.

    So….it’s hard to say 1) if the problem you found exists in real scenarios where the query cache is used and 2) if it does, the problem might be small and transient, and not really significant.

    But to answer your question, we have *not* seen a drastic reduction in queries due to upgrading and the query cache.

    (we have seen this: http://bugs.mysql.com/bug.php?id=36259 which is fixed in 5.1.37, and noticed as early as 5.0.22…, but that was after we’d defragmented some tables as part of an upgrade, since we had a long downtime window….)

  • Mark Callaghan says:

    There were some cases in 5.0.84 where lock-waits for query cache locks would timeout allowing some sessions to ignore checking the query cache when there was mutex contention. That code is not in 5.0.44 and 5.1.38. It explains my results.

  • Sheeri Cabral says:

    Mark — good to know what causes it!

  • Hide-and-SQL » Blog Archive » usefulness of the query cache says:

    […] of talk lately about the effectiveness or lack thereof of the MySQL query cache.  I’m kind of […]

  • Xaprb says:

    Sheeri,

    You said it’s not widely used — actually I’d say about 90% of new clients I see ARE using it. This week even I saw someone with a 1GB query cache (I’ve seen bigger, but not often). I think that probably the most common reason for this is that intermediate-level users turn it on when trying to get more performance from their systems.

    That’s all just wild guessing.

  • Sheeri says:

    Baron — Let me clarify — among the clients I’ve worked with that have upgraded, the query cache isn’t widely used.

    It’s true that the query cache is widely used, though how many times do you end up turning off the query cache? For me it’s about 50% of the time…..so I’m talking specifically about the subset of clients I’ve seen in the past 6 months or so….

    Pythian checks that and many other variables on our initial audits of systems; so yes, we see it being used widely, but we ensure that we recommend using it only where appropriate, and we recommend turning it off when appropriate as well.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>