Comment by osakasake
4 days ago
If you want some more feedback, why are you using Cloudflare workers that presumably cost you money? You can retrieve all of the HN content with a regular PC pretty easily. I’m talking a single core with a python program and minimal RAM.
You're right that a simple Python script would be more cost-effective for this kind of archiving. I went with workers because I was already familiar with the stack and wanted real-time processing, but for a research project focused on completeness rather than latency, your approach makes much more sense - please reach out if you want to offer your help. Initially I was planning on building a public realtime dashboard and might as well still do.