Comment by woodruffw

3 years ago

HN has an Algolia-based API. It’s also very easy to crawl.

I wouldn’t call this evil, however: it’s merely demonstrating a technique that you should be aware of, if you’re a privacy-conscious person. It looks like they also provide some resources for avoiding stylometric detection.

I would bet my bottom dollar that the likes of Reddit and Google already have models to turn a corpus of text into probable demographic data and models to measure the similarity of users.