← Back to context

Comment by rcfox

14 years ago

Maybe it's not fair to cite the work of super heroes, but Peter Norvig wrote a spelling corrector in 21 lines of Python: http://norvig.com/spell-correct.html

It is fair, it is a first step in that direction :-)

That corrector has no context though, so it will not correct misspelled words that happen to come out as other words (incl. the "Their coming too sea if its reel.").

This is especially important on the web, where pretty much every conceivable word is "correct" (a name of a company or otherwise). False positives are costly, and context disambiguation critical. Let the fun begin.

  • Incidentally, the linked article by Peter Norvig also mentions that adding context is the best way to improve spelling correction. If you think this claim is so controversial, explain yourself, don't just downvote please.

  • You keep using this word "spelling". I don't think it means what you think it means.

    This is GRAMMAR checking, or at least grammar-assisted spell checking.

    Very few, if any, shipping mainstream spelling correctors do that.

    • People want their documents (or queries) free of spelling errors. That is their pain and that is the challenge.

      The example sentence has misspelled words, hence is within the domain of spell checking. This type of misspelled words are called "homonyms", which are one very common spelling problem. The academic terminology is uninteresting to most users, however.

      Or if you mean to posit that "spell checking really means looking up if a word exists in a static dictionary of English", then yes, that's easy and solved, no argument there.

      13 replies →

    • No, it's spell-checking. Use of the word "reel" in this sentence, for instance, is definitely a spelling error. There's no grammatically valid form of the word "real" spelled with two 'e's. The fact that "reel" happens to be a valid word doesn't mean that its presence in the sentence is due to a grammatical error - it's just an accidental collision.

      That no spell checker might be able to catch this specific class of error doesn't change the type of error it is.