Gzip compression will achieve around 6x compression on a sorted wordlist. So even the 2008 wordlist of 240K words would reduce to about 400K. Cut the list in half or more, use a compression better suited for a sorted wordlist, and you could have a useful wordlist that would fit temporarily in RAM even in 256K machines.
I think that drfanke is suggesting that you load in half the compressed dictionary ( or a third, or whatever percentage works), and then because the document has been sorted in alphabetical order, you simply start comparing words. When you get to a word that is alphabetically after the last entry in the current dictionary, you load the next segment of the dictionary. It's not a bad approach, and even has the interesting property of enabling a load of speed enhancements of the algo, because the next word in your document is going to be nearby to its spelling in the dictionary...
Does anyone know if this algo was ever actually used?
Gzip compression will achieve around 6x compression on a sorted wordlist. So even the 2008 wordlist of 240K words would reduce to about 400K. Cut the list in half or more, use a compression better suited for a sorted wordlist, and you could have a useful wordlist that would fit temporarily in RAM even in 256K machines.
Perhaps you should do a http://de.wikipedia.org/wiki/Burrows-Wheeler-Transformation first?
I think that drfanke is suggesting that you load in half the compressed dictionary ( or a third, or whatever percentage works), and then because the document has been sorted in alphabetical order, you simply start comparing words. When you get to a word that is alphabetically after the last entry in the current dictionary, you load the next segment of the dictionary. It's not a bad approach, and even has the interesting property of enabling a load of speed enhancements of the algo, because the next word in your document is going to be nearby to its spelling in the dictionary...
Does anyone know if this algo was ever actually used?
Compressed on disk/floppy. Decompress one block at a time as you scan through it.