Comment by Someone

4 months ago

For the basic word list, possibly tries (https://en.wikipedia.org/wiki/Trie), DAGs (https://en.wikipedia.org/wiki/Directed_acyclic_graph#Data_co...), or Bloom filter (https://en.wikipedia.org/wiki/Bloom_filter)

The article is about fitting large dictionaries into small memory footprints. Writing a 200K word spell checker on a machine with only 256K memory.

When you need to store your dictionary in under 1 byte per word, a trie won't cut it.

according to https://en.wikipedia.org/wiki/Ispell ispell (1971) already used Levenshtein Distance (although from the article it is not stated if this already existed in the original version, or if it was added in later years).

  • Levenshtein distance up to 1, according to that article. If you have a hierarchical structure (trie or a DAG; in some sense, a DAG is a trie, but stored more efficiently, with the disadvantage that adding or removing words is hard) with valid words, it is not hard to check what words satisfy that. If you only do the inexact search after looking for the exact word and finding it missing I think it also won’t be too slow when given ‘normal’ text to spell-check.