← Back to context

Comment by Radim

14 years ago

I know it's HN folklore to claim "Trivial! Would do over a weekend!", but please...

Doing a half-decent production spell checker is STILL a major feat. Same as "just crawling the web" (further down the discussion). Both require problem understanding and engineering you can't see and appreciate at a glance.

And no, looking up individual words in some predefined dictionary doesn't qualify as half-decent spell checking, especially for non-English languages. Spelling correction is another step.

    "Their coming too sea if its reel."

That's not the point of the article. He's not talking about writing a state-of-the-art spelling-and-grammar checker.

> And no, looking up individual words in some predefined dictionary doesn't qualify as half-decent spell checking,

Well, but the author is talking about that problem! Even if you don't consider that real spell-checking, his point still stands. Let's define crappy-spell-checking as "looking up individual words in some predefined dictionary"; that problem used to be hard and now it's very easy, as in, you could write one in 15 minutes using Python.

  • Ok, fair point -- I blame the misleading title :)

    I read the point of the article to compare "spell-checking then (80s) and now", whereas others read it more along the lines of "looking up static English words then and now". Your nickname sounds japanese, but I assume you're talking about English as well, with those 15 minutes.

Maybe it's not fair to cite the work of super heroes, but Peter Norvig wrote a spelling corrector in 21 lines of Python: http://norvig.com/spell-correct.html

  • It is fair, it is a first step in that direction :-)

    That corrector has no context though, so it will not correct misspelled words that happen to come out as other words (incl. the "Their coming too sea if its reel.").

    This is especially important on the web, where pretty much every conceivable word is "correct" (a name of a company or otherwise). False positives are costly, and context disambiguation critical. Let the fun begin.

    • Incidentally, the linked article by Peter Norvig also mentions that adding context is the best way to improve spelling correction. If you think this claim is so controversial, explain yourself, don't just downvote please.

    • You keep using this word "spelling". I don't think it means what you think it means.

      This is GRAMMAR checking, or at least grammar-assisted spell checking.

      Very few, if any, shipping mainstream spelling correctors do that.

      15 replies →

That'd be a grammar checker, that's a whole 'nother beast. That should spell check just fine.

  • spell checking for languages that sport a high rate of agglutination where you can't (feasibly) enumerate every possible form of a stem is still a major feat. languages like turkish, finnish and hungarian are prime examples of this.