<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Pop Quiz:  Index length</title>
	<atom:link href="http://www.pythian.com/news/1417/pop-quiz-index-length/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pythian.com/news/1417/pop-quiz-index-length/</link>
	<description>News and views from Pythian DBAs</description>
	<lastBuildDate>Fri, 10 Feb 2012 13:01:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Sheeri Cabral</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312553</link>
		<dc:creator>Sheeri Cabral</dc:creator>
		<pubDate>Thu, 11 Dec 2008 15:19:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312553</guid>
		<description>kimseong -- that&#039;s actually well documented:  

&lt;a HREF=&quot;http://dev.mysql.com/doc/refman/5.0/en/create-index.html&quot; rel=&quot;nofollow&quot;&gt;http://dev.mysql.com/doc/refman/5.0/en/create-index.html&lt;/A&gt;

&lt;I&gt;Prefix lengths are storage engine-dependent (for example, a prefix can be up to 1000 bytes long for MyISAM tables, 767 bytes for InnoDB tables). Note that prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for non-binary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set. For example, utf8 columns require up to three index bytes per character. &lt;/I&gt;</description>
		<content:encoded><![CDATA[<p>kimseong &#8212; that&#8217;s actually well documented:  </p>
<p><a HREF="http://dev.mysql.com/doc/refman/5.0/en/create-index.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/create-index.html</a></p>
<p><i>Prefix lengths are storage engine-dependent (for example, a prefix can be up to 1000 bytes long for MyISAM tables, 767 bytes for InnoDB tables). Note that prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for non-binary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set. For example, utf8 columns require up to three index bytes per character. </i></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kimseong</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312298</link>
		<dc:creator>kimseong</dc:creator>
		<pubDate>Thu, 11 Dec 2008 05:29:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312298</guid>
		<description>There is another interesting phenomena that I observed while testing, I am only using 5.1, if you have the varchar field very long, Innodb only index 767 bytes. I guess Innodb splits it and store the first part in the row and the rest separately like Text/Blob.</description>
		<content:encoded><![CDATA[<p>There is another interesting phenomena that I observed while testing, I am only using 5.1, if you have the varchar field very long, Innodb only index 767 bytes. I guess Innodb splits it and store the first part in the row and the rest separately like Text/Blob.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sheeri Cabral</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312273</link>
		<dc:creator>Sheeri Cabral</dc:creator>
		<pubDate>Thu, 11 Dec 2008 04:08:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312273</guid>
		<description>Actually, kimseong got it.  :)</description>
		<content:encoded><![CDATA[<p>Actually, kimseong got it.  :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sheeri Cabral</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312272</link>
		<dc:creator>Sheeri Cabral</dc:creator>
		<pubDate>Thu, 11 Dec 2008 04:07:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312272</guid>
		<description>So far everyone has said the same thing -- overhead due to primary key being clustered with the index in InnoDB.  Tom was the only one who actually tried changing it and realized that no, that wasn&#039;t the answer.  (BTW, that&#039;s why I asked specifically for proof -- actual changing the table -- because SMALLINT adding 2 bytes was my first assumption too).

Keep digging!  I found the answer after a few more guesses.  I will give you one &quot;gimme&quot; -- it&#039;s not due to the nullability; in fact, there is an extra 1 byte of overhead if the field is a nullable field.</description>
		<content:encoded><![CDATA[<p>So far everyone has said the same thing &#8212; overhead due to primary key being clustered with the index in InnoDB.  Tom was the only one who actually tried changing it and realized that no, that wasn&#8217;t the answer.  (BTW, that&#8217;s why I asked specifically for proof &#8212; actual changing the table &#8212; because SMALLINT adding 2 bytes was my first assumption too).</p>
<p>Keep digging!  I found the answer after a few more guesses.  I will give you one &#8220;gimme&#8221; &#8212; it&#8217;s not due to the nullability; in fact, there is an extra 1 byte of overhead if the field is a nullable field.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Krouper</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312248</link>
		<dc:creator>Tom Krouper</dc:creator>
		<pubDate>Thu, 11 Dec 2008 03:45:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312248</guid>
		<description>I thought I had this one.
-- Answer I started --
The remaining two bytes are related to the PRIMARY KEY. Every index in an InnoDB table is preceded with the PRIMARY KEY.
SMALLINT is 2 bytes, that would be the issue.
--
However, when I switched the PRIMARY KEY to an INTEGER the key_len remained 47. I&#039;m still digging.</description>
		<content:encoded><![CDATA[<p>I thought I had this one.<br />
&#8211; Answer I started &#8211;<br />
The remaining two bytes are related to the PRIMARY KEY. Every index in an InnoDB table is preceded with the PRIMARY KEY.<br />
SMALLINT is 2 bytes, that would be the issue.<br />
&#8211;<br />
However, when I switched the PRIMARY KEY to an INTEGER the key_len remained 47. I&#8217;m still digging.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nicholas Ring</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312246</link>
		<dc:creator>Nicholas Ring</dc:creator>
		<pubDate>Thu, 11 Dec 2008 03:43:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312246</guid>
		<description>To follow on with Sheeri Cabral reply. 

InnoDB non-primary indexes also include the primary key, in this case `actor_id` which is a smallint which is two bytes...

http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html</description>
		<content:encoded><![CDATA[<p>To follow on with Sheeri Cabral reply. </p>
<p>InnoDB non-primary indexes also include the primary key, in this case `actor_id` which is a smallint which is two bytes&#8230;</p>
<p><a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kimseong</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312245</link>
		<dc:creator>kimseong</dc:creator>
		<pubDate>Thu, 11 Dec 2008 03:42:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312245</guid>
		<description>The 2 bytes is the length of the data. I just wonder why 1 byte is not enough in this case since it is less than 255 characters.

To prove, change the varchar to char and the index will not have the extra 2 bytes.


If the field allows NULL, then it will have 1 more byte in the index to store the NULL flag.</description>
		<content:encoded><![CDATA[<p>The 2 bytes is the length of the data. I just wonder why 1 byte is not enough in this case since it is less than 255 characters.</p>
<p>To prove, change the varchar to char and the index will not have the extra 2 bytes.</p>
<p>If the field allows NULL, then it will have 1 more byte in the index to store the NULL flag.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arjen Lentz</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312223</link>
		<dc:creator>Arjen Lentz</dc:creator>
		<pubDate>Thu, 11 Dec 2008 02:46:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312223</guid>
		<description>Do note which version of MySQL are you using, that it&#039;s 4 bytes per UTF8 character rather than 3.

And just to be a smartpants I&#039;ll answer and prove where the remaining 2 bytes come from: a secondary index in InnoDB points to the value of the primary key, which in this case is a SMALLINT taking up 2 bytes. The proof that the primary key value resides in the row would be a query like
EXPLAIN SELECT last_name, actor_id FROM actor WHERE ...
which will show the idx_actor_last_name index being used with the appropriate access type, and then &quot;Using index&quot; in the Extra field indicating that the rest of the row data did not have to be accessed - thereby proving that the value of actor_id was in the secondary index also.
Some sensible (for the dataset) WHERE clause will be necessary, as InnoDB is otherwise likely to choose to scan the primary key.</description>
		<content:encoded><![CDATA[<p>Do note which version of MySQL are you using, that it&#8217;s 4 bytes per UTF8 character rather than 3.</p>
<p>And just to be a smartpants I&#8217;ll answer and prove where the remaining 2 bytes come from: a secondary index in InnoDB points to the value of the primary key, which in this case is a SMALLINT taking up 2 bytes. The proof that the primary key value resides in the row would be a query like<br />
EXPLAIN SELECT last_name, actor_id FROM actor WHERE &#8230;<br />
which will show the idx_actor_last_name index being used with the appropriate access type, and then &#8220;Using index&#8221; in the Extra field indicating that the rest of the row data did not have to be accessed &#8211; thereby proving that the value of actor_id was in the secondary index also.<br />
Some sensible (for the dataset) WHERE clause will be necessary, as InnoDB is otherwise likely to choose to scan the primary key.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sheeri Cabral</title>
		<link>http://www.pythian.com/news/1417/pop-quiz-index-length/#comment-312210</link>
		<dc:creator>Sheeri Cabral</dc:creator>
		<pubDate>Thu, 11 Dec 2008 02:13:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/1417/pop-quiz-index-length#comment-312210</guid>
		<description>The length of the varchar field is 45, but it&#039;s a utf character set.  That&#039;s why the index length is so high -- utf8 uses 4 bytes per character, so the maximum data length is 45*4=180 bytes.

To prove this, I change the charset of last_name to latin1, 1 byte per character:

mysql&gt; ALTER TABLE actor MODIFY last_name varchar(45) NOT NULL, CHARSET=latin1;
Query OK, 200 rows affected (0.13 sec)
Records: 200  Duplicates: 0  Warnings: 0


mysql&gt; EXPLAIN SELECT last_name FROM actor\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor
         type: index
possible_keys: NULL
          key: idx_actor_last_name
      key_len: 47
          ref: NULL
         rows: 200
        Extra: Using index
1 row in set (0.01 sec)

So, what accounts for the remaining 2 bytes?  Please remember to prove your answer.</description>
		<content:encoded><![CDATA[<p>The length of the varchar field is 45, but it&#8217;s a utf character set.  That&#8217;s why the index length is so high &#8212; utf8 uses 4 bytes per character, so the maximum data length is 45*4=180 bytes.</p>
<p>To prove this, I change the charset of last_name to latin1, 1 byte per character:</p>
<p>mysql> ALTER TABLE actor MODIFY last_name varchar(45) NOT NULL, CHARSET=latin1;<br />
Query OK, 200 rows affected (0.13 sec)<br />
Records: 200  Duplicates: 0  Warnings: 0</p>
<p>mysql> EXPLAIN SELECT last_name FROM actor\G<br />
*************************** 1. row ***************************<br />
           id: 1<br />
  select_type: SIMPLE<br />
        table: actor<br />
         type: index<br />
possible_keys: NULL<br />
          key: idx_actor_last_name<br />
      key_len: 47<br />
          ref: NULL<br />
         rows: 200<br />
        Extra: Using index<br />
1 row in set (0.01 sec)</p>
<p>So, what accounts for the remaining 2 bytes?  Please remember to prove your answer.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

