Comment by User23

3 years ago

Cool, I only skimmed the description maybe I needed to read it more carefully.

Have you considered doing rune rather than word ngrams? I can imagine that might be prohibitively expensive, but I really don’t know. I did something like that long long ago in C for automatic document language detection. It was quite accurate.