Comment by jkarneges

5 days ago

The HN/Firebase API doesn't make this easy. For https://hnstream.com I ended up crawling items to find the article.

Any tips on respectfully crawling HN so you don’t get throttled? I had an application idea that could not be served by the API (need karma values) so I started to write code to scrape but got rate limited pretty quickly.

  • I've had no trouble hitting the Firebase API at the speed items are created, with a 5 second delay between retries.

    For scraping HN directly, in my experience you have to go extremely slow, like 1 minute between fetching items. And if you get blocked, it may be better to wait a long time (minutes) before trying again rather than exponential backoff, in order to get out of the penalty box. You'll need a cache for sure.

The comments don't even have a thread ID?