There is more than one way to do it….

Sep 28, 2009 / By Sheeri Cabral

Tags:

I spent Friday examining the systems for a more traditional consulting gig (in case you did not know, Pythian has offered more traditional consulting and emergency services in addition to our remote DBA offering for our entire 12 year history). It is a familiar story to many people — the database performance was unacceptable.

The company had a few log tables that had huge amounts of inserts and foreign keys, so they used InnoDB. Unfortunately, they also used a GUID as a primary key (which is getting more and more common these days, but long primary keys with InnoDB slow down INSERT, UPDATE and DELETE commands a great deal) — varchar(32) and utf8.

That’s right — their primary key for many of these tables was 96 bytes long (32 characters * 3 bytes per character), and as an InnoDB table, the primary key is clustered with every other index. One of the more frequently used tables had 8 indexes other than the PRIMARY KEY — 7 were indexes on varchar(32) fields, the other was on an int(11) field (signed integer).

The query load on the server was 99% writes, 1% reads. The log tables only had rows inserted, not deleted or updated. Thus, we offered a few different solutions (in no particular order):

1) Use a BLACKHOLE table on the master and replicate to a slave server. This would make the INSERTs to the master fast, though the slave would lag behind. The slave could be changed to MyISAM or ARCHIVE or some other storage engine.

2) Use MyISAM on the master and slave. This was the solution we went to, because the log tables could be re-created with no data and moved into place, and the loss of foreign keys was not a big deal, since the application checked for them anyway. ARCHIVE could also be used, it is faster than MyISAM on inserts because it only allows one index, the PRIMARY KEY. concurrent_insert was set at the default of 1, but we changed it to 2 just in case there was an errant UPDATE or DELETE causing a gap in the MyISAM table.

3) Add a PRIMARY KEY that is AUTO_INCREMENT, keeping the GUID a UNIQUE KEY. This would speed up the writes without sacrificing any data. This can be combined with dropping unnecessary foreign keys to make the inserts faster.

4) Move the log tables to a different machine entirely. This would require the application to have two different database handlers.

5) Use INSERT DELAYED into the logging tables.

We discussed these scenarios and the pros and cons, and eventually we chose to do #2, which we could do right away, and indeed, we had an immediate positive impact. Almost all the unnacceptable performance went away.

We were also able to audit the system and provide several good recommendations for ways they can improve their system. It is exciting for me to be able to help people out in a very tangible way — often we do not see some of the really crazy problems that affect few people (such as slave lag getting further and further behind) because our remote DBA clients have the benefit of ongoing help from us, and we generally find the smaller degradations in performance before they become a large issue.

Being able to spend a few hours with a new environment and really help them out was extremely fulfilling for me. I love that MySQL has many different options (and there are likely more that I did not think of) that all would work to achieve better performance. I think people like to have a choice, too, so they do not feel they were forced to do something they did not want to do. Especially when there are solutions on both the DBA and developer side.

4 Responses to “There is more than one way to do it….”

  • Shlomi Noach says:

    Hi Sheeri,
    Good writing. I’ve actually worked with a European company who told me that according to some law, user data *must* be identified by GUID (or otherwise 32 hex digits), so as not to have a “simple” number, possible auto-increment, which can be used by hackers to look for users’ data.
    Meaning, the purpose was to make it harder for hackers to get people’s details by guessing their primary key value.

    Regards,
    Shlomi

  • Sheeri Cabral says:

    Shlomi — but a hacker can only get information by primary key if the primary key is *used* in a retrieval system. The suggestion was to keep everything as it is, but change the primary key to an auto_increment — lookups, etc are still done by GUID. The code wouldn’t change, so hacking into the code wouldn’t allow a hacker to use the “simple” number — it is transparent to the code.

    And if the hacker hacks into the database, they can just do SELECT * FROM tbl LIMIT x,y to get values.

  • Shlomi Noach says:

    Sheeri,

    I’m assuming hacker couldn’t get direct access to db, but rather use XSS/SQL injection.

    Anyway, I’m just sharing this information. If I’m not mistaken, this law was from Germany.

  • Sheeri Cabral says:

    Shlomi — I appreciate the information! I wonder if the law is “no auto-increment” period, or “no easy way to get the data” — in the US laws are more like the latter, and thus it’s OK for the suggestion.

    Plus, I like to put out all possible suggestions, and then discuss the pros and cons. For example, solution #4 wasn’t a possibility because this company is very small and doesn’t have another server to put another database on. There’s many reasons why one solution will work better or won’t work at all — the idea is to start with all the ideas, and work from there. (and yes, ideas may include “move to a different database like Postgres” or “keep the info in a text log file” too).

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>