Comment by dfranke

18 years ago

I don't get what the big deal is. As long as the document you're checking fits in RAM, then you can just sort the words alphabetically and then check them in a single pass through the dictionary. If you really need random access, use a B+ tree.

Where do you plan to store the dictionary for reference?

  • I think that drfanke is suggesting that you load in half the compressed dictionary ( or a third, or whatever percentage works), and then because the document has been sorted in alphabetical order, you simply start comparing words. When you get to a word that is alphabetically after the last entry in the current dictionary, you load the next segment of the dictionary. It's not a bad approach, and even has the interesting property of enabling a load of speed enhancements of the algo, because the next word in your document is going to be nearby to its spelling in the dictionary...

    Does anyone know if this algo was ever actually used?

Isn't alphabetically sorting a document containing (potentially) tens of thousands of words going to be pretty slow on a machine of this vintage? Isn't it a bit perverse to have a spellcheck which scales worse than O(N)?

  • Typical clock rates at that time would have been 5MHz. So let's say you have a 64KB document containing 16K words. For a merge sort that's 2^18 comparisons, each requiring a small number of CPU instructions, so it would take on the order of a quarter of a second. That's better than MS Word running on a modern system :-). Besides, no matter what you do, the bottleneck is going to be reading the dictionary off the floppy.