Comment by blinding-streak

5 days ago

Cool!

Would be slightly more contextual if the title of the original post was displayed.

The HN/Firebase API doesn't make this easy. For https://hnstream.com I ended up crawling items to find the article.

  • Any tips on respectfully crawling HN so you don’t get throttled? I had an application idea that could not be served by the API (need karma values) so I started to write code to scrape but got rate limited pretty quickly.

    • I've had no trouble hitting the Firebase API at the speed items are created, with a 5 second delay between retries.

      For scraping HN directly, in my experience you have to go extremely slow, like 1 minute between fetching items. And if you get blocked, it may be better to wait a long time (minutes) before trying again rather than exponential backoff, in order to get out of the penalty box. You'll need a cache for sure.

  • The comments don't even have a thread ID?

    • Comment items look like https://hacker-news.firebaseio.com/v0/item/45533616.json?pri...:

        {
          "by" : "jkarneges",
          "id" : 45533018,
          "kids" : [ 45533616 ],
          "parent" : 45532549,
          "text" : "The HN&#x2F;Firebase API doesn&#x27;t make this easy. For <a href=\"https:&#x2F;&#x2F;hnstream.com\" rel=\"nofollow\">https:&#x2F;&#x2F;hnstream.com</a> I ended up   crawling items to find the article.",
          "time" : 1760043552,
          "type" : "comment"
        }
      

      "parent" can either be the actual parent comment or the parent article, depending where in the comment chain you are.

      4 replies →

I tried this, but it required making a request for every comment and would probably call for a backend, wheras this can run just off of the Firebase websocket stream on a static HTML file.