Pop Quiz – Index length

Dec 10, 2008 / By Sheeri Cabral

Tags: ,

A quiz, in 4 parts:

Given the following table definition in the sakila database:

mysql> SHOW CREATE TABLE actor\G
*********************** 1. row ***********************
       Table: actor
Create Table: CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`),
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Why is the index length of the index on last_name 182?

mysql> EXPLAIN SELECT last_name FROM actor\G
*********************** 1. row ***********************
           id: 1
  select_type: SIMPLE
        table: actor
         type: index
possible_keys: NULL
          key: idx_actor_last_name
      key_len: 182
          ref: NULL
         rows: 200
        Extra: Using index
1 row in set (0.00 sec)

Please prove your answers with examples. I’ll start off in the first comment with getting you part of the way there, so you can see what a “answer” with an example looks like.

9 Responses to “Pop Quiz – Index length”

  • Sheeri Cabral says:

    The length of the varchar field is 45, but it’s a utf character set. That’s why the index length is so high — utf8 uses 4 bytes per character, so the maximum data length is 45*4=180 bytes.

    To prove this, I change the charset of last_name to latin1, 1 byte per character:

    mysql> ALTER TABLE actor MODIFY last_name varchar(45) NOT NULL, CHARSET=latin1;
    Query OK, 200 rows affected (0.13 sec)
    Records: 200 Duplicates: 0 Warnings: 0

    mysql> EXPLAIN SELECT last_name FROM actor\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: actor
    type: index
    possible_keys: NULL
    key: idx_actor_last_name
    key_len: 47
    ref: NULL
    rows: 200
    Extra: Using index
    1 row in set (0.01 sec)

    So, what accounts for the remaining 2 bytes? Please remember to prove your answer.

  • Arjen Lentz says:

    Do note which version of MySQL are you using, that it’s 4 bytes per UTF8 character rather than 3.

    And just to be a smartpants I’ll answer and prove where the remaining 2 bytes come from: a secondary index in InnoDB points to the value of the primary key, which in this case is a SMALLINT taking up 2 bytes. The proof that the primary key value resides in the row would be a query like
    EXPLAIN SELECT last_name, actor_id FROM actor WHERE …
    which will show the idx_actor_last_name index being used with the appropriate access type, and then “Using index” in the Extra field indicating that the rest of the row data did not have to be accessed – thereby proving that the value of actor_id was in the secondary index also.
    Some sensible (for the dataset) WHERE clause will be necessary, as InnoDB is otherwise likely to choose to scan the primary key.

  • kimseong says:

    The 2 bytes is the length of the data. I just wonder why 1 byte is not enough in this case since it is less than 255 characters.

    To prove, change the varchar to char and the index will not have the extra 2 bytes.

    If the field allows NULL, then it will have 1 more byte in the index to store the NULL flag.

  • Nicholas Ring says:

    To follow on with Sheeri Cabral reply.

    InnoDB non-primary indexes also include the primary key, in this case `actor_id` which is a smallint which is two bytes…

    http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html

  • Tom Krouper says:

    I thought I had this one.
    — Answer I started —
    The remaining two bytes are related to the PRIMARY KEY. Every index in an InnoDB table is preceded with the PRIMARY KEY.
    SMALLINT is 2 bytes, that would be the issue.

    However, when I switched the PRIMARY KEY to an INTEGER the key_len remained 47. I’m still digging.

  • Sheeri Cabral says:

    So far everyone has said the same thing — overhead due to primary key being clustered with the index in InnoDB. Tom was the only one who actually tried changing it and realized that no, that wasn’t the answer. (BTW, that’s why I asked specifically for proof — actual changing the table — because SMALLINT adding 2 bytes was my first assumption too).

    Keep digging! I found the answer after a few more guesses. I will give you one “gimme” — it’s not due to the nullability; in fact, there is an extra 1 byte of overhead if the field is a nullable field.

  • Sheeri Cabral says:

    Actually, kimseong got it. :)

  • kimseong says:

    There is another interesting phenomena that I observed while testing, I am only using 5.1, if you have the varchar field very long, Innodb only index 767 bytes. I guess Innodb splits it and store the first part in the row and the rest separately like Text/Blob.

  • Sheeri Cabral says:

    kimseong — that’s actually well documented:

    http://dev.mysql.com/doc/refman/5.0/en/create-index.html

    Prefix lengths are storage engine-dependent (for example, a prefix can be up to 1000 bytes long for MyISAM tables, 767 bytes for InnoDB tables). Note that prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for non-binary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set. For example, utf8 columns require up to three index bytes per character.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>