← Back to context

Comment by costco

3 years ago

> Edit: by bigrams and trigrams, I mean reducing word to their parts of speech labels and using THOSE as word tokens. You’ll find that native English speakers have higher weights on some phrase construction patterns than, say, folks from Romania. TF-IDF is useful for these POS-grams (just made that word up) as well.

That is a very good idea and when I update the site that will almost certainly be included :) Any other tips? Been reading papers for ideas and I think I may have to ditch the cosine similarity and go for something fancier soon. Thank you