Comment by pistoriusp

18 years ago

I remember bloom filters (http://en.wikipedia.org/wiki/Bloom_filter) been good for this sort of thing.

4 comments

pistoriusp

This article [] was linked in the Reddit comments for this post - it goes into an explanation of bloom filters and then includes the code for a spell checker implemented with them.

[]: http://ipowerinfinity.wordpress.com/2008/03/02/bloom-filters...

jrockway 18 years ago

Thanks for this link. Much more interesting than the article.

dangoldin 18 years ago

Definitely. I enjoy reading about these data structures/algorithms.
Seems a lot of these cool things came out in the 80s due to the limits of memory/computational speed.

aston 18 years ago

The idea would be to read in the entire dictionary and mark the Bloom Filter for each word. Then whenever you get a word you want to test, see whether all of the places it would mark were previously marked. If it misses any, you know it's not a word. Unfortunately, if it doesn't miss, you're only sure it's probably a word (with the worst case probability controlled by the number of hash functions you use).