Comment by lettergram
3 years ago
Frankly similar to how I was doing in back in 2018 (when you and I chatted about it on HN lol)
https://news.ycombinator.com/item?id=17944293
The approach I took was a bit different, but also no ML required.
The real trick is pruning and going cross platform. There are around 100k active HN accounts (meaning posts a few times a year), maybe 200k if you count at least one post a year. But <10k that post weekly.
It’s a very small space to try to compare so simple methods will work fine.
Exactly. HN emphasizes long-form posts much more than other forums which makes the commenters here very susceptible to this kind of analysis. Plus you can fit every single HN comment in RAM on a mid tier gaming laptop so it's even easier. I was trying to think of applications of this kind of data and the only thing I could think of was moderation tools/detecting ban evaders but what you've done seems much more profitable lol.