Comment by bscphil

3 years ago

The scary thing is that once you have this data, finding HN matches for individual targeted users on other sites becomes trivial, even if those sites are harder to scrape. I bet most people here have an anonymous Reddit account, for example. If you wanted to know who was behind a particular Reddit account, you could feed it into something like this and compare the results with HN, where accounts are less likely to be anonymous. Or build a database based on blogs, Github comments, etc.

Also, since this uses only word frequency, there are probably relatively easy improvements to make that would make it even more powerful, like looking at particular runs of words that are unique. Some expressions or figurative language only show up in combinations of words, and tend to be highly style specific.

I could have used a part of speech tagger, looked at time of day a user posts, capitalization, spelling errors, etc. From what I understand the state of the art is lightyears ahead of this, there are even companies with actual linguists who will act as expert witnesses in court to say stuff like "we can say with 95% certainty that xyz authored this email." Honestly it's kind of scary. There are papers that talk about cross platform authorship attribution, one I think did it with Twitter, Blogspot, G+ and had pretty good results.

Thus proving the only actually anonymous community in practice is 4chan, and that’s why it’s so toxic.

  • If you define “toxic” as “people disagreeing with you”, sure. That was what the entire internet was like until maybe 2005.

    • I'm old enough to remember when 4chan was self identifying as the Internet's hate machine, before xkcd referenced it as such: https://xkcd.com/591/

      Sometimes people insist that's all role-play and irony; others insist that if it ever was, it certainly isn't now.

      But regardless, I remember pre-2005, and it wasn't all like what I saw the two times I looked at 4chan. Bits were. Bits were much worse. But mostly, mostly, people were kinder… at least, unless political tribalism came up.