<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.6.5" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: MySQL: &#8220;SOUNDS LIKE&#8221; vs. Full-Text search</title>
	<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search</link>
	<description>News and views from Pythian DBAs</description>
	<pubDate>Fri,  5 Dec 2008 00:36:43 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
		<item>
		<title>By: Sheeri Cabral</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140785</link>
		<dc:creator>Sheeri Cabral</dc:creator>
		<pubDate>Fri, 14 Dec 2007 18:59:28 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140785</guid>
		<description>Paul -- great discussion.

Putting fulltext search outside the database is ideal for the reasons you stated.

However, it's also not ideal for 3 reasons (people will have to weigh all the pros and cons for what's good for their environment, as there's no clear-cut "always" and "never" here.....):

0)  The features of a [relational] database are taken away -- most importantly, the ability to relate the text to other entities.  You also take away sorting, but that's easily fixable in OS's, and perhaps most fulltext engines, such as Lucene*.  A Google appliance is quite costly for just plain ol' text searching.  There are also companies out there that sell the service of searching your data quickly -- Transparencee (http://www.transparensee.com/)* is one I've heard of.

This is very important, because it is an important point I make when talking about whether or not to put images in a database (see also http://sheeri.net/archives/39 ).  One of the factors is easily being able to relate the data to each other -- otherwise, you're using a database to store information, not actually relate data to each other.  If you want to just store data, you can put it all in flat files!

Being able to easily look up text is important, but so is being able to update and delete text that has some certain metadata associated with it -- maybe that's a user id, or a timestamp, etc.  And honestly, it's the same with images.  Yes, there are costs to storing large BLOBS in the database, but that's a factor, not an argument, because using a database simply for storage is a silly idea anyway.

1)  In some cases you want to perform functions on the text in part or in whole.  For example, encryption....

2)  The data is in more than one place.  This is similar to point 0 in that if you have all the data in the database, it's all in one place.  But it warrants having its own bullet point, because it's more difficult to get a reliable point-in-time snapshots of the data for backup and recovery.  You can't replicate all the data, either.

I think it would be great if DBMS' had API hooks, so you could choose whether or not to use the way they implement fulltext searching or not.  That would add overhead, as the data and the search engine would have to communicate as opposed to the application going straight to the searching engine when it needs to.  However, it would also make searching a black box, and not require application changes to use.  

SQL Server already uses an external solution:

http://www.developer.com/db/article.php/3446891

It uses Microsoft's search to look for data.  However, this also means that it doesn't automatically re-index when a change is made.  (See the "Indexing Considerations" part at http://www.sitepoint.com/blogs/2006/11/12/sql-server-full-text-search-protips-part-1-setup/ )

And I know MySQL is working on a FULLTEXT solution that's server-wide.  Maybe they'll include API hooks so we can drop in our own solution.

* note that Lucene and Transparencee aren't products or companies Pythian/I endorse nor discourage -- those are simply examples I've heard of, but never used myself.</description>
		<content:encoded><![CDATA[<p>Paul &#8212; great discussion.</p>
<p>Putting fulltext search outside the database is ideal for the reasons you stated.</p>
<p>However, it&#8217;s also not ideal for 3 reasons (people will have to weigh all the pros and cons for what&#8217;s good for their environment, as there&#8217;s no clear-cut &#8220;always&#8221; and &#8220;never&#8221; here&#8230;..):</p>
<p>0)  The features of a [relational] database are taken away &#8212; most importantly, the ability to relate the text to other entities.  You also take away sorting, but that&#8217;s easily fixable in OS&#8217;s, and perhaps most fulltext engines, such as Lucene*.  A Google appliance is quite costly for just plain ol&#8217; text searching.  There are also companies out there that sell the service of searching your data quickly &#8212; Transparencee (http://www.transparensee.com/)* is one I&#8217;ve heard of.</p>
<p>This is very important, because it is an important point I make when talking about whether or not to put images in a database (see also <a href="http://sheeri.net/archives/39" rel="nofollow">http://sheeri.net/archives/39</a> ).  One of the factors is easily being able to relate the data to each other &#8212; otherwise, you&#8217;re using a database to store information, not actually relate data to each other.  If you want to just store data, you can put it all in flat files!</p>
<p>Being able to easily look up text is important, but so is being able to update and delete text that has some certain metadata associated with it &#8212; maybe that&#8217;s a user id, or a timestamp, etc.  And honestly, it&#8217;s the same with images.  Yes, there are costs to storing large BLOBS in the database, but that&#8217;s a factor, not an argument, because using a database simply for storage is a silly idea anyway.</p>
<p>1)  In some cases you want to perform functions on the text in part or in whole.  For example, encryption&#8230;.</p>
<p>2)  The data is in more than one place.  This is similar to point 0 in that if you have all the data in the database, it&#8217;s all in one place.  But it warrants having its own bullet point, because it&#8217;s more difficult to get a reliable point-in-time snapshots of the data for backup and recovery.  You can&#8217;t replicate all the data, either.</p>
<p>I think it would be great if DBMS&#8217; had API hooks, so you could choose whether or not to use the way they implement fulltext searching or not.  That would add overhead, as the data and the search engine would have to communicate as opposed to the application going straight to the searching engine when it needs to.  However, it would also make searching a black box, and not require application changes to use.  </p>
<p>SQL Server already uses an external solution:</p>
<p><a href="http://www.developer.com/db/article.php/3446891" rel="nofollow">http://www.developer.com/db/article.php/3446891</a></p>
<p>It uses Microsoft&#8217;s search to look for data.  However, this also means that it doesn&#8217;t automatically re-index when a change is made.  (See the &#8220;Indexing Considerations&#8221; part at <a href="http://www.sitepoint.com/blogs/2006/11/12/sql-server-full-text-search-protips-part-1-setup/" rel="nofollow">http://www.sitepoint.com/blogs/2006/11/12/sql-server-full-text-search-protips-part-1-setup/</a> )</p>
<p>And I know MySQL is working on a FULLTEXT solution that&#8217;s server-wide.  Maybe they&#8217;ll include API hooks so we can drop in our own solution.</p>
<p>* note that Lucene and Transparencee aren&#8217;t products or companies Pythian/I endorse nor discourage &#8212; those are simply examples I&#8217;ve heard of, but never used myself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Vallee</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140780</link>
		<dc:creator>Paul Vallee</dc:creator>
		<pubDate>Fri, 14 Dec 2007 18:14:10 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140780</guid>
		<description>acasalamata: I sent an email to the address you submitted when posting your comment, unfortunately it bounced. Here's what it said:

&lt;blockquote&gt;Hey there,

If your comments are being dropped from our blog, rest assured I'm interested in that not happening. Please let me know if you're willing to work with my team to get to the bottom of the issue.

Thanks

Paul
--
Paul Vallee, CEO, The Pythian Group, Inc.
http://www.linkedin.com/in/paulvallee
&lt;/blockquote&gt; 

My email is vallee@pythian.com, reach out and we'll figure out why your comments are being lost.</description>
		<content:encoded><![CDATA[<p>acasalamata: I sent an email to the address you submitted when posting your comment, unfortunately it bounced. Here&#8217;s what it said:</p>
<blockquote><p>Hey there,</p>
<p>If your comments are being dropped from our blog, rest assured I&#8217;m interested in that not happening. Please let me know if you&#8217;re willing to work with my team to get to the bottom of the issue.</p>
<p>Thanks</p>
<p>Paul<br />
&#8211;<br />
Paul Vallee, CEO, The Pythian Group, Inc.<br />
<a href="http://www.linkedin.com/in/paulvallee" rel="nofollow">http://www.linkedin.com/in/paulvallee</a>
</p></blockquote>
<p>My email is <a href="mailto:vallee@pythian.com">vallee@pythian.com</a>, reach out and we&#8217;ll figure out why your comments are being lost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Vallee</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140775</link>
		<dc:creator>Paul Vallee</dc:creator>
		<pubDate>Fri, 14 Dec 2007 18:06:32 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140775</guid>
		<description>&gt; You sure block peoples’ comments.

No we don't - if you posted and it did not show up here, let me know and we can work together to figure it out. We only moderate for spam.

Paul</description>
		<content:encoded><![CDATA[<p>> You sure block peoples’ comments.</p>
<p>No we don&#8217;t - if you posted and it did not show up here, let me know and we can work together to figure it out. We only moderate for spam.</p>
<p>Paul</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: acasalamata</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140769</link>
		<dc:creator>acasalamata</dc:creator>
		<pubDate>Fri, 14 Dec 2007 17:52:20 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140769</guid>
		<description>You sure block peoples' comments.</description>
		<content:encoded><![CDATA[<p>You sure block peoples&#8217; comments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gigiduru</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140766</link>
		<dc:creator>gigiduru</dc:creator>
		<pubDate>Fri, 14 Dec 2007 17:34:50 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140766</guid>
		<description>If you have that kind of money, I should ask you the same question: why don't you migrate to Oracle? And in this way you solved all your problems with glitches that an open source product might have, plus you have technical support 24x7. It's simple: you pay a boat load of money on a solid reliable product and you'll rest assure that your RDBMS will run smoothly.
   If you don't have that kind of money, but you still wanna migrate to a different, I'd choose PostgreSQL for two reasons:
1. I know very well what MySQL can do and CAN'T do. The "CAN'T do" part is astonishing big, in total contrast with what MySQL AB touts on their website. "The right tool for the right job" principle is good in theory.

2. You cannot be RDBMS agnostic, pick whatever falls in your hands and consider fit for the job. Not after I have to run REPAIR TABLE on a 700+ mil rows MyISAM table, after some insert delay glitches.
It's simply not professional to recommend something like this to your clients, knowing that might loose data and have downtimes.

2. Just look at http://www.postgresql.org/docs/8.3/static/release-8-3.html it's compelling enough to make da move. Until it's out ... hang on, your Saviour is on the way. It'll be a steep learning curve but I'm sure you'll make it.
 
By the way, my day to day work is MySQL DBA. Hopefully, it'll not be MySQL for a long time. 
Heck, even SQLite is better than MySQL for the simple reason that it does what it states. Meeting expectations is everything.</description>
		<content:encoded><![CDATA[<p>If you have that kind of money, I should ask you the same question: why don&#8217;t you migrate to Oracle? And in this way you solved all your problems with glitches that an open source product might have, plus you have technical support 24&#215;7. It&#8217;s simple: you pay a boat load of money on a solid reliable product and you&#8217;ll rest assure that your RDBMS will run smoothly.<br />
   If you don&#8217;t have that kind of money, but you still wanna migrate to a different, I&#8217;d choose PostgreSQL for two reasons:<br />
1. I know very well what MySQL can do and CAN&#8217;T do. The &#8220;CAN&#8217;T do&#8221; part is astonishing big, in total contrast with what MySQL AB touts on their website. &#8220;The right tool for the right job&#8221; principle is good in theory.</p>
<p>2. You cannot be RDBMS agnostic, pick whatever falls in your hands and consider fit for the job. Not after I have to run REPAIR TABLE on a 700+ mil rows MyISAM table, after some insert delay glitches.<br />
It&#8217;s simply not professional to recommend something like this to your clients, knowing that might loose data and have downtimes.</p>
<p>2. Just look at <a href="http://www.postgresql.org/docs/8.3/static/release-8-3.html" rel="nofollow">http://www.postgresql.org/docs/8.3/static/release-8-3.html</a> it&#8217;s compelling enough to make da move. Until it&#8217;s out &#8230; hang on, your Saviour is on the way. It&#8217;ll be a steep learning curve but I&#8217;m sure you&#8217;ll make it.</p>
<p>By the way, my day to day work is MySQL DBA. Hopefully, it&#8217;ll not be MySQL for a long time.<br />
Heck, even SQLite is better than MySQL for the simple reason that it does what it states. Meeting expectations is everything.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Vallee</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140759</link>
		<dc:creator>Paul Vallee</dc:creator>
		<pubDate>Fri, 14 Dec 2007 17:12:05 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140759</guid>
		<description>I think an interesting discussion can be salvaged from this. Let's try.

Let's assume for a minute that the full-text feature can be taken out of the existing database and the db migration objection can be cast aside.

You should know I'm no apologist for MySQL or even for Oracle for that matter. I'm all about the data, not the database platform. So...

I'm interested in arguments justifying building that full-text feature in the database at all, given the existence of the Google appliance. Assuming postgresql's full-text features are so kickin' compared to mysql's, talk to us about that I'm eager to learn. Finally, if we're gonna spend some money and it needs to be in a database, why wouldn't we just spend $5000 on Oracle SE1 on a single quad-core CPU loaded with gads of RAM? 

Paul</description>
		<content:encoded><![CDATA[<p>I think an interesting discussion can be salvaged from this. Let&#8217;s try.</p>
<p>Let&#8217;s assume for a minute that the full-text feature can be taken out of the existing database and the db migration objection can be cast aside.</p>
<p>You should know I&#8217;m no apologist for MySQL or even for Oracle for that matter. I&#8217;m all about the data, not the database platform. So&#8230;</p>
<p>I&#8217;m interested in arguments justifying building that full-text feature in the database at all, given the existence of the Google appliance. Assuming postgresql&#8217;s full-text features are so kickin&#8217; compared to mysql&#8217;s, talk to us about that I&#8217;m eager to learn. Finally, if we&#8217;re gonna spend some money and it needs to be in a database, why wouldn&#8217;t we just spend $5000 on Oracle SE1 on a single quad-core CPU loaded with gads of RAM? </p>
<p>Paul</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gigiduru</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140757</link>
		<dc:creator>gigiduru</dc:creator>
		<pubDate>Fri, 14 Dec 2007 17:04:05 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140757</guid>
		<description>Me thinks you're wrong about me being a rookie but keep smelling, who knows what you get into.
Also me thinks it's bad to adopt a toy to do man's job. Migrating from a db to another is just a matter of time: the sooner the better, especially when you have to migrate from a toy to a real open source RDBMS, that's also much more robust and reliable.
  I better shoot myself than use ever again MyISAM. At the very lowest level, if I'm forced to use MySQL, I'll use InnoDB, but that, as you probably noticed, defeats the purpose of Sheeri's post. 
  Good luck!</description>
		<content:encoded><![CDATA[<p>Me thinks you&#8217;re wrong about me being a rookie but keep smelling, who knows what you get into.<br />
Also me thinks it&#8217;s bad to adopt a toy to do man&#8217;s job. Migrating from a db to another is just a matter of time: the sooner the better, especially when you have to migrate from a toy to a real open source RDBMS, that&#8217;s also much more robust and reliable.<br />
  I better shoot myself than use ever again MyISAM. At the very lowest level, if I&#8217;m forced to use MySQL, I&#8217;ll use InnoDB, but that, as you probably noticed, defeats the purpose of Sheeri&#8217;s post.<br />
  Good luck!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Vallee</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140752</link>
		<dc:creator>Paul Vallee</dc:creator>
		<pubDate>Fri, 14 Dec 2007 16:39:37 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140752</guid>
		<description>Umm, if they have already adopted MySQL I'm pretty sure that's incredibly bad advice you're providing there. Do you have any idea how much money is involved in a database migration project, all costed out? Methinks I smell a rookie.</description>
		<content:encoded><![CDATA[<p>Umm, if they have already adopted MySQL I&#8217;m pretty sure that&#8217;s incredibly bad advice you&#8217;re providing there. Do you have any idea how much money is involved in a database migration project, all costed out? Methinks I smell a rookie.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gigiduru</title>
		<link>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140750</link>
		<dc:creator>gigiduru</dc:creator>
		<pubDate>Fri, 14 Dec 2007 16:31:49 +0000</pubDate>
		<guid>http://www.pythian.com/blogs/735/mysql-sounds-like-vs-full-text-search#comment-140750</guid>
		<description>How about telling your friend to quit using this mysql toy and install something that supports natively fulltext search? Something like PostgreSQL 8.3, which is around the corner.
  Remember, friends don't let friends use MySQL.</description>
		<content:encoded><![CDATA[<p>How about telling your friend to quit using this mysql toy and install something that supports natively fulltext search? Something like PostgreSQL 8.3, which is around the corner.<br />
  Remember, friends don&#8217;t let friends use MySQL.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
