MySQL: “SOUNDS LIKE” vs. Full-Text search
Dec 14, 2007 / By Sheeri Cabral
A friend of mine asked me:
I’m hoping you can help me out with something — I’m trying to optimize a search feature. Since it uses a MySQL database, the search already uses the
LIKEstatement to get matches for a search query, we might be needing something more flexible. I found mention on MySQL’s website about something called the
SOUNDS LIKEexpression that can be more flexible than
LIKE. Do you know anything about this? If you do, can you point me a direction where I might be able to learn more about it? Thanks in advance for your help!
I haven’t used
SOUNDS LIKE, but the MySQL manual’s pretty good. So here are my thoughts on what it has to say.
The manual’s sub-section on string functions says:
expr1 SOUNDS LIKE expr2
This is the same as
SOUNDEX(expr1) = SOUNDEX(expr2).
So let’s say your example is searching for “Sheeri”. You’d do
WHERE field SOUNDS LIKE "%Sheeri%";
And maybe you’re hoping to get fields that contain “Sheeri” and “cheery”. However, what this will do is
WHERE SOUNDEX(field) = SOUNDEX('%Sheeri%');
There is some important information in the manual’s description of the soundex function, which says that you can use
SUBSTRING to get a part of a string. But you want to search on some part of a field, and you don’t know where in that field the string might be.
One of the biggest problems with a search feature using
LIKE is that inevitably you use search terms such as
WHERE field LIKE "%Sheeri%";
MySQL can use an index on a text field, but the internal format of the index uses the first character, then the second character, and so on, just as you search for a word in a dictionary. First you get to the section that has the first letter, then flip pages to get to the second letter, etc.
However, just as it would be impossible for you to search a dictionary for all words ending in “th”, MySQL cannot use an index on a string/text field if you have a wildcard at the beginning of your comparison. MySQL can use an index if you search
WHERE field like "Sheeri%";
WHERE field like "Sh%ri%";
WHERE field like "%Sheeri";
I think what you really want is a full-text search. Full-text search can only be done against MyISAM tables. With full-text search, you can sort rows in order of relevance.
It won’t address the issue where someone spells something wrong in a search, but the
WITH QUERY EXPANSION mode of search can help that. The manual’s sub-section on Full-Text Searches with Query Expansion says it best:
It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”.
Make sure you read all the gotchas in the Full-Text Restrictions chapter.
I hope this helps!