Comment by sakai
13 years ago
I'll consider that a challenge...
To any that have experience getting comments data from HN -- what's the fastest, most polite way to do this? And am I correct in remembering that there's some aggressive rate-limiting for crawling the site?
Don't crawl the site, please. The place to get data is the HNSearch API.
There's a database(quite old) of HN posts and comments here:
http://www.btscene.eu/details/2240774/Hacker+News+Database+o...
Any plans to try to get a dataset for supervised ml? Perhaps collect the top 4 comments from all front page threads and post a survey on HN asking HNers to rate those comments for skepticism/dismissiveness?