sort_buffer_size and Knowing Why

Posted in: Technical Track

In How to tune MySQL’s sort_buffer_size, Baron gives a condescending viewpoint on how to tune the sort_buffer_size variable in MySQL. In a much-nicer-nutshell, his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, he handwaves that “people utterly ruin their server performance and stability with it,” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

Interested in working with Sheeri? Schedule a tech call.

5 Comments. Leave new

More on understanding sort_buffer_size | MySQL Expert | MySQL Performance | MySQL Consulting
May 10, 2010 4:48 pm

[…] sort_buffer_size and Knowing Why sort_buffer_size and Knowing Why Read Buffer performance hit more on malloc() speed […]

Reply
Ronald Bradford
May 10, 2010 5:43 pm

My 2 cents worth of more of the “why” which developers and DBA’s need to know.

http://ronaldbradford.com/blog/more-on-understanding-sort_buffer_size-2010-05-10/

Reply
BAAG, Best Practices and Multiple Choice Exams | The Pythian Blog
May 10, 2010 9:34 pm

[…] been following the discussion in various MySQL blogs regarding the sort_buffer_size parameters. As an Oracle DBA, I don’t have an […]

Reply
Venu Anuganti Blog » MySQL Query Engine Scalability Issues
May 11, 2010 3:27 am

[…] there are number of community blog posts on this sort_buffer_size scalability (here, here and here). The main problem is not because of memory allocation, but its because of making array of […]

Reply

Thank you for taking the time to explain. I just want to add that sometimes you don’t have other choice other than tweaking mysql configs like sort_buffer_size. We are using a third-party application which we can’t change for upgrade and licensing reasons and the only choice left to us is tweaking indexes, memory related variables and that’s it. Baron’s article is little bit off in my opinion.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *