Comment by voxic11
3 hours ago
That is just the archive part, if you just would finish reading the paragraph you would know that updates since 2026-03-16 23:55 UTC are "are fetched every 5 minutes and committed directly as individual Parquet files through an automated live pipeline, so the dataset stays current with the site itself."
So to get all the data you need to grab the archive and all the 5 minute update files.
archive data is here https://huggingface.co/datasets/open-index/hacker-news/tree/...
update files are here (I know that its called "today" but it actually includes all the update files which span multiple days at this point) https://huggingface.co/datasets/open-index/hacker-news/tree/...
>if you just would finish reading the paragraph
probably uncalled for
not really since original comment completely missed it
not to be "that guy" but it is pretty explicitly laid out in the guidelines, with an example and everything