Where can I learn more about the Google search "did you mean" algorithm?

AlgorithmNlpSpell Checking

Algorithm Problem Overview


> Possible Duplicate:
> How do you implement a “Did you mean”?

I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:

alt text

Is there source code available for such a thing or where can I find articles that would help me to build my own?

Algorithm Solutions


Solution 1 - Algorithm

You should check out Peter Norvigs article about implementing the spell checker in a few lines of python: How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)

Solution 2 - Algorithm

I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.

Disclaimer: This may have just been part of their algorithm

Solution 3 - Algorithm

Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:

> get_close_matches(word, possibilities[, n][, cutoff]) > > Return a list of the best "good > enough" matches. word is a sequence > for which close matches are desired > (typically a string), and > possibilities is a list of sequences against which to match > word (typically a list of strings). > > Optional argument n (default > 3) is the maximum number of close > matches to return; n must be > greater than 0. > > Optional argument cutoff (default > 0.6) is a float in the range [0, > 1]. Possibilities that don't score > at least that similar to word are > ignored. > > The best (no more than n) matches > among the possibilities are returned > in a list, sorted by similarity > score, most similar first. >

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

Could this library help you?

Solution 4 - Algorithm

You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.

Solution 5 - Algorithm

You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/

Solution 6 - Algorithm

I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.

Solution 7 - Algorithm

I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.

Solution 8 - Algorithm

AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.

Solution 9 - Algorithm

A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.

Solution 10 - Algorithm

U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[0], "\t", i[1]

U get:

>>> 
String 	Similarity
"iis7 configure ftp 7.5" 	0.76
"mac configure ftp 	0.24"
"ubunto configre 8.5" 	0.19

Solution 11 - Algorithm

take a look at Levenshtein-Automata

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionvidhiView Question on Stackoverflow
Solution 1 - AlgorithmBrokenGlassView Answer on Stackoverflow
Solution 2 - AlgorithmSmasheryView Answer on Stackoverflow
Solution 3 - AlgorithmEscualoView Answer on Stackoverflow
Solution 4 - AlgorithmOligoglotView Answer on Stackoverflow
Solution 5 - Algorithmm7dView Answer on Stackoverflow
Solution 6 - AlgorithmGeekView Answer on Stackoverflow
Solution 7 - AlgorithmKyleView Answer on Stackoverflow
Solution 8 - AlgorithmColin HebertView Answer on Stackoverflow
Solution 9 - AlgorithmFabian SteegView Answer on Stackoverflow
Solution 10 - Algorithmhugo24View Answer on Stackoverflow
Solution 11 - AlgorithmFrancisView Answer on Stackoverflow