Comment by dspillett
1 day ago
Is there a public dump of the data anywhere that this is based upon, or have they scraped it themselves?
Such as DB might be entertaining to play with, and the threadedness of comments would be useful for beginners to practise efficient recursive queries (more so than the StackExchange dumps, for instance).
While not a dump per se, there is an API where you can get HN data programmatically, no scraping needed.
https://github.com/HackerNews/API
Yes, you can see the download HN bash script in the repository now that simply extract the data to your local machine from BigQuery and saves it as a series of gzip JSON files
Ah, the repo was 404ing for me last time I checked (seems fine now) so I couldn't inspect that. I'll have a play later.