Comment by operator-name

3 years ago

I wouldn't worry about that too much as someone's already done something similar for reddit (https://towardsdatascience.com/using-nlp-to-identify-reddito...), and has released their code publicly (https://github.com/jabraunlin/reddit-user-id)

Given the technique used, I don't see why something simple and local wouldn't defeat it? The "easiest" technique would be to use this weighting as a negative metric in rewriting.