Pop Quiz – Index length

Posted in: Technical Track

A quiz, in 4 parts:

Given the following table definition in the sakila database:

mysql> SHOW CREATE TABLE actor\G
*********************** 1. row ***********************
       Table: actor
Create Table: CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`),
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Why is the index length of the index on last_name 182?

mysql> EXPLAIN SELECT last_name FROM actor\G
*********************** 1. row ***********************
           id: 1
  select_type: SIMPLE
        table: actor
         type: index
possible_keys: NULL
          key: idx_actor_last_name
      key_len: 182
          ref: NULL
         rows: 200
        Extra: Using index
1 row in set (0.00 sec)

Please prove your answers with examples. I’ll start off in the first comment with getting you part of the way there, so you can see what a “answer” with an example looks like.

Interested in working with Sheeri? Schedule a tech call.

9 Comments. Leave new

The length of the varchar field is 45, but it’s a utf character set. That’s why the index length is so high — utf8 uses 4 bytes per character, so the maximum data length is 45*4=180 bytes.

To prove this, I change the charset of last_name to latin1, 1 byte per character:

mysql> ALTER TABLE actor MODIFY last_name varchar(45) NOT NULL, CHARSET=latin1;
Query OK, 200 rows affected (0.13 sec)
Records: 200 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT last_name FROM actor\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: actor
type: index
possible_keys: NULL
key: idx_actor_last_name
key_len: 47
ref: NULL
rows: 200
Extra: Using index
1 row in set (0.01 sec)

So, what accounts for the remaining 2 bytes? Please remember to prove your answer.

Reply

Do note which version of MySQL are you using, that it’s 4 bytes per UTF8 character rather than 3.

And just to be a smartpants I’ll answer and prove where the remaining 2 bytes come from: a secondary index in InnoDB points to the value of the primary key, which in this case is a SMALLINT taking up 2 bytes. The proof that the primary key value resides in the row would be a query like
EXPLAIN SELECT last_name, actor_id FROM actor WHERE …
which will show the idx_actor_last_name index being used with the appropriate access type, and then “Using index” in the Extra field indicating that the rest of the row data did not have to be accessed – thereby proving that the value of actor_id was in the secondary index also.
Some sensible (for the dataset) WHERE clause will be necessary, as InnoDB is otherwise likely to choose to scan the primary key.

Reply

The 2 bytes is the length of the data. I just wonder why 1 byte is not enough in this case since it is less than 255 characters.

To prove, change the varchar to char and the index will not have the extra 2 bytes.

If the field allows NULL, then it will have 1 more byte in the index to store the NULL flag.

Reply

To follow on with Sheeri Cabral reply.

InnoDB non-primary indexes also include the primary key, in this case `actor_id` which is a smallint which is two bytes…

http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html

Reply

I thought I had this one.
— Answer I started —
The remaining two bytes are related to the PRIMARY KEY. Every index in an InnoDB table is preceded with the PRIMARY KEY.
SMALLINT is 2 bytes, that would be the issue.

However, when I switched the PRIMARY KEY to an INTEGER the key_len remained 47. I’m still digging.

Reply

So far everyone has said the same thing — overhead due to primary key being clustered with the index in InnoDB. Tom was the only one who actually tried changing it and realized that no, that wasn’t the answer. (BTW, that’s why I asked specifically for proof — actual changing the table — because SMALLINT adding 2 bytes was my first assumption too).

Keep digging! I found the answer after a few more guesses. I will give you one “gimme” — it’s not due to the nullability; in fact, there is an extra 1 byte of overhead if the field is a nullable field.

Reply

Actually, kimseong got it. :)

Reply

There is another interesting phenomena that I observed while testing, I am only using 5.1, if you have the varchar field very long, Innodb only index 767 bytes. I guess Innodb splits it and store the first part in the row and the rest separately like Text/Blob.

Reply

kimseong — that’s actually well documented:

http://dev.mysql.com/doc/refman/5.0/en/create-index.html

Prefix lengths are storage engine-dependent (for example, a prefix can be up to 1000 bytes long for MyISAM tables, 767 bytes for InnoDB tables). Note that prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for non-binary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set. For example, utf8 columns require up to three index bytes per character.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *