Comment by Radim
14 years ago
People want their documents (or queries) free of spelling errors. That is their pain and that is the challenge.
The example sentence has misspelled words, hence is within the domain of spell checking. This type of misspelled words are called "homonyms", which are one very common spelling problem. The academic terminology is uninteresting to most users, however.
Or if you mean to posit that "spell checking really means looking up if a word exists in a static dictionary of English", then yes, that's easy and solved, no argument there.
> The example sentence has misspelled words, hence is within the domain of spell checking.
No, it doesn't . It is grammatically incorrect, but all the words are spelled correctly. You're definitely talking about a grammar checker, not a spelling checker.
If you were a teacher marking a student's paper, you would label those as spelling errors, not a grammar errors. The reason is that a different spelling of the words would create the intended sentence. No grammatical variation will do that (reorders, conjugating differently, etc).
Really? Most teachers I had would put WW (wrong word) or WC (word choice) to imply that the word is incorrect, which is considered grammatical.
I understand this is a mismatch in terminology.
In your view, "spelling check" is applied to individual words, to see if they appear in a fixed dictionary (see my other comment about difficulties with choosing this "correct" dictionary in reality, though). I can imagine this view is inviting for programmers, because it's easy to implement, but I doubt anyone else finds useful a definition that says there are no misspellings in "Their coming too sea if its reel."
In my view, "spell check" applies to utterances and roughly means "all words are spelled as per the norm of the language; I can send this document to my boss/customer and they won't laugh at my spelling." It is a more user-centric view, and more complex too, because it covers intent and norms, as opposed to the comforting lookup table for a few hand-picked strings. Modern spell checkers make heavy use of statistical analysis of large text corpora, to reasonably approximate context needed to model such intent.
Once we agree on the terminology, I believe we are in agreement, so let's not split hairs. The "correctly spelled" sentence under question comes from the Wikipedia article on spell checking, by the way.
I can imagine this view is inviting for programmers, because it's easy to implement, but I doubt anyone else finds useful a definition that says there are no misspellings in "Their coming too sea if its reel."
I realize this is just a debate over a definition, so it's not very meaningful, but the fact is, you're on the wrong side of the common definition here. I just tested, and every spell checker that I just checked (Chrome, Firefox, MS Word, TextMate, TextEdit - perhaps some of these rely on the same underlying engine, I'm not sure?) accepts that sentence as not having a spelling error, so clearly there's some use for such a definition.
The grammar checkers, on the other hand, don't like it, but by changing "their" to "they're", they all accept it, despite the fact that it's still garbage. So don't overestimate how good "modern" spell checkers are...though there may be techniques to do a better job, they're not in common use, at least in the most common spell-checking contexts (which, lets be honest, pretty much means MS Word).
2 replies →
The sentence contains misspelled words. The errors in those words happen to make them collide with other existing words. That doesn't change the category of the error.
"""The sentence contains misspelled words."""
No, it contains correctly spelled words used in place of other desired words.
"I sea your eyes", "I she your eyes"
If you want to correct these kind of errors, you must now a lot about natural language. Also, the above are trivial cases. There are tons of edge cases and far more difficult distinctions. Here's an amusing one, that can lead to "Microsoft paperclip" like interactions:
"I gave him the new pink dress as a present" => "I gave her the new pink dress as a present"
No, idiotic spell-checker, I do mean him. My friend is a cross-dresser, shut up and let me type.
Such a spellchecker would also be useless for poetry. And if you find poetry obscure, so it doesn't really matter, then such a spellchecker would also be useless for irony. Suddenly, you lose all the hipsters from your potential users (except if they start using it ironically).
Anyway, no spell checker in widespread use attempts this --and it's probably a very hard nut to crack, and probably uncrackable in the general case.
1 reply →
Hey, here's a chance to invent some new terminology!
I would say that Norvig's corrector is a first-order spelling corrector, since it works within the context of a single word.
A second-order corrector would take into account the word before or after it to choose the spelling that is more likely to make sense. ("Their coming" would suggest a correction to "They're coming")
Third-, fourth-, (and so on) order expands the distance of words considered.
"""("Their coming" would suggest a correction to "They're coming")'""
How about: "Her relatives would visit us for Christmas. Their coming filled us with dread!"
It's valid, but less common, and that's why I specifically chose the wording "suggest a correction" instead of "correct". Spell checkers are still no substitute for actual thought.