Comment by quuxplusone

2 hours ago

An interesting idea I saw long ago in some book (I thought it was K&P's "Software Tools," or my second guess was K&R1, but neither of those panned out — a strong Mandela effect) was the clever idea of a whole-document spellchecker that works purely probabilistically, by histograms: you feed it a document, it tallies the trigraphs, and any trigraph that appears only rarely is flagged as a likely typo. This approach lets through unknown-but-realistic words like "antithematory" while flagging unrealistic words like "prisencolinensinainciusol" (because of its unlikely "ciu" and "ius" clusters) and "antthemaory" (because of "ntt" and "aor").

To make this approach work better, feed it a bunch of English text (or whatever language your document is in) before the document you really want to "spellcheck."

Essentially this isn't a spell "checker" so much as a spell "linter" — it looks for antipatterns statistically associated with bugs, and reports the patterns for further investigation.

If anyone knows where this trigraph-based "spellchecker" was first presented, I'd love to find out again.

1 comment

quuxplusone

mattkrause 1 hour ago

That's uhh...a language model?