Comment by pistoriusp
18 years ago
I remember bloom filters (http://en.wikipedia.org/wiki/Bloom_filter) been good for this sort of thing.
18 years ago
I remember bloom filters (http://en.wikipedia.org/wiki/Bloom_filter) been good for this sort of thing.
This article [] was linked in the Reddit comments for this post - it goes into an explanation of bloom filters and then includes the code for a spell checker implemented with them.
[]: http://ipowerinfinity.wordpress.com/2008/03/02/bloom-filters...
Thanks for this link. Much more interesting than the article.
Definitely. I enjoy reading about these data structures/algorithms.
Seems a lot of these cool things came out in the 80s due to the limits of memory/computational speed.
The idea would be to read in the entire dictionary and mark the Bloom Filter for each word. Then whenever you get a word you want to test, see whether all of the places it would mark were previously marked. If it misses any, you know it's not a word. Unfortunately, if it doesn't miss, you're only sure it's probably a word (with the worst case probability controlled by the number of hash functions you use).