Comment by sodafountan
18 hours ago
The GitHub page is no longer available, which is a shame because I'm really interested in how this works.
How was the entirety of HN stored in a single SQLite database? In other words, how was the data acquired? And how does the page load instantly if there's 22GB of data having to be downloaded to the browser?
You can see it now, forgot to make it public.
- 1. download_hn.sh - bash script that queries BigQuery and saves the data to *.json.gz
- 2. etl-hn.js - does the sharding and ID -> shard map, plus the user stats shards.
- 3. Then either npx serve docs or upload to CloudFlare Pages.
The ./toool/s/predeploy-checks.sh script basically runs the entire pipeline. You can do it unattended with AUTO_RUN=true
Awesome, I'll take a look