Comment by pistoriusp

18 years ago

I remember bloom filters (http://en.wikipedia.org/wiki/Bloom_filter) been good for this sort of thing.

Thanks for this link. Much more interesting than the article.

  • Definitely. I enjoy reading about these data structures/algorithms.

    Seems a lot of these cool things came out in the 80s due to the limits of memory/computational speed.

The idea would be to read in the entire dictionary and mark the Bloom Filter for each word. Then whenever you get a word you want to test, see whether all of the places it would mark were previously marked. If it misses any, you know it's not a word. Unfortunately, if it doesn't miss, you're only sure it's probably a word (with the worst case probability controlled by the number of hash functions you use).