← Back to context

Comment by zX41ZdbW

5 days ago

I host a publicly open database with Hacker News data at https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...

So you can create any sort of similar services in a single SQL query and an HTML page.

I also hosted it as a publicly accessible data lake, which you can query from everywhere: https://github.com/ClickHouse/ClickHouse/issues/29693#issuec...

It is also updated in real-time.

This is awesome!

I do want to point out that the data in that ClickHouse playground only seems to go as far back as April 6, 2024 according to the query below:

  SELECT * FROM hackernews_history ORDER BY update_time ASC LIMIT 10

This is of course still extremely useful, and generous! It just wasn't obvious from the comment that this isn't querying against all Hacker News data.

Thank you for providing this, you are a hero!!! I'm gonna try to do cool stuff with it!

It probably also got swamped in real-time...

  • Do you mean it's not updated? You gotta sort by update_time column. Looks sorted, but you gotta sort it with a query like:

    SELECT * FROM hackernews_history

    ORDER BY update_time DESC

    LIMIT 100;

    And yeah, I got that from deepseek because I don't have a brain.

oh hey, per HN terms and conditions I license my HN data only to HN. Can you please remove my data from the set? Thank you!

  • Not sure if joking, but if this product is not republishing the text of your contributions (to which you hold copyright), you’re probably not going to convince a court to do anything here.

    Generally speaking it is not a violation to scrape, index, and analyze web content as long as you don’t republish copyrighted content without a license, or violate access controls. For example: search engine indexes.

  • By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.

    @zX41ZdbW, you can safely ignore this guy.

    @GeoAtreides, next time read the actual terms of service before hallucinating.

    • > for any Y Combinator-related purpose

      That is actually the key phrase. HN can provide the API, no problem. People can consume the API, no problem.. But I'd ask an attorney if API consumers can then re-release the data for purposes not related to YC. By my reading, they cannot.

      4 replies →

  • Steve Carrell yelling “I DECLARE BANKRUPTCY!!” in The Office dot gif

    • Is this GDPR territory with fines up to EUR 10 million or 2% of a company’s global annual turnover? Not sure what are the fines for some random person though

  • Wait, so I have to ask for every single person's permissions to use this data?

    uhhhhhhhhhhhhhhhhhhhhhh