Comment by dibt

3 years ago

Since it looks for similar word usage, false positives seem to appear more often when specific topics are talked about, like stocks or crypto.

Does this ignore stop words? Or do all words have the same weighting? I wonder if only focusing on stop words would give a more accurate measure. Maybe we are more comfortable with certain stop words more than others?

https://en.wikipedia.org/wiki/Stop_words

"Stop words are the words in a stop list (or stoplist or negative dictionary) which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are insignificant."

1 comment

dibt

costco 3 years ago

All words have the same weighting. I don't ignore stop words, in fact most of the ngrams I use are compromised almost entirely of stop words. Maybe it'd be more effective if I ignored them.