← Back to context Comment by erekp 4 days ago how do you exactly fallback to common crawl? isn't the cost to even hold and query common crawl insane? 2 comments erekp Reply andrethegiant 4 days ago With AWS Athena, you can query the contents of someone else’s public S3 bucket. You pay per read, but if you craft your query the right way then it’s very inexpensive. Each query I run only scans about 1MB of data. wfn 3 days ago Since I was just looking at this accidentally, here are some examples of how to query at a ~cent-per-query cost level (just examples but quite illustrative): https://commoncrawl.org/blog/index-to-warc-files-and-urls-in...
andrethegiant 4 days ago With AWS Athena, you can query the contents of someone else’s public S3 bucket. You pay per read, but if you craft your query the right way then it’s very inexpensive. Each query I run only scans about 1MB of data.
wfn 3 days ago Since I was just looking at this accidentally, here are some examples of how to query at a ~cent-per-query cost level (just examples but quite illustrative): https://commoncrawl.org/blog/index-to-warc-files-and-urls-in...
With AWS Athena, you can query the contents of someone else’s public S3 bucket. You pay per read, but if you craft your query the right way then it’s very inexpensive. Each query I run only scans about 1MB of data.
Since I was just looking at this accidentally, here are some examples of how to query at a ~cent-per-query cost level (just examples but quite illustrative): https://commoncrawl.org/blog/index-to-warc-files-and-urls-in...