← Back to context

Comment by llimllib

9 years ago

Here's a google bigquery that lists the most common PDFs referenced in the github sample dataset, and the top 100 results: https://gist.github.com/llimllib/3f1877eab06208958060f491cf3...

It's possible to run this query against the full github dataset but I couldn't figure out how to pay for it, so if somebody wants to do that it would be excellent.

just a note: it's bizarre that I absolutely cannot find a way to determine a) how much it would cost to run or b) how I would pay for it if I wanted to run it