Everything is default included, and I have a long list of not-blog domains that are excluded.[0] Plus, I exclude the Alexa top 500.
There are lots of not-blogs still in the dataset, but I just exclude them when I come across them in popular views. But I'm sure if you dig through positions 101-5000 you'll find lots of domains that don't match my official criteria for a blog.
Everything is default included, and I have a long list of not-blog domains that are excluded.[0] Plus, I exclude the Alexa top 500.
There are lots of not-blogs still in the dataset, but I just exclude them when I come across them in popular views. But I'm sure if you dig through positions 101-5000 you'll find lots of domains that don't match my official criteria for a blog.
https://github.com/mtlynch/hn-popularity-contest-data/blob/m...