Implementing Fuzzy search in SQL server
Fuzzy Search in SQL Server is not done very well. This post will cover how to implement Fuzzy Search capabilities using several approaches.
What is it?
Fuzzy Search is the process to locate records that are relevant to a search, even when the search criteria doesn't match. Fuzzy Searches are used to:- Suggest the correct spelling of a word ("Did you mean this...").
- Find results related to your search term ("You might also like...").
- Finding synonyms for search terms (Search for Dog and also get results for Puppies and Pets).
- Probably other things...
What Does it Fix?
As suggested above, Fuzzy Search logic is great for when you or the users don't know the exact term they're looking for. For example, let's say I want to find the latest James Bond movie, but only vaguely know its name. An IMDB search for "Specter" brings up the James Bond "Spectre" movie as a suggestion today. If you actually search for the term, you'll get a wide variety of increasingly weird results (And also the right answer!). I don't have access to the IMDB code, but I believe they're using a combination of the Levenshtein Distance, Trigrams, and maybe Double Metaphones. Coincidentally, those are exactly what we're covering here. [caption id="attachment_81169" align="alignnone" width="360"] An example search on IMDB[/caption]Fuzzy Search in SQL Server
Before implementing Fuzzy Search in SQL Server, I'm going to define what each function does. As you'll see, they are all complementary to each other and can be used together to return a wide range of results that would be missed with traditional queries or even just one of these functions. The three functions we're going to look at are:- Levenshtein Distance
- Trigrams
- Double Metaphones