Comment by __alexander

3 months ago

Care to share the scrapped data? I would love to play around with it.

7 comments

__alexander

Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html

saberience 3 months ago

So you're ok with stealing the data yourself but not ok with providing it to others, ironic.

I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages.

jacquesm 3 months ago
They might send him a bill for use of resources.
- cjaackie 3 months ago
  
  I’m wondering about how ethical it is to load down a resource in this way, open to opinions. There is a mention “I didn’t hammer down the servers” but what does that really even mean? The site isn’t being used as intended and just curious how other people feel about that.

I am not sure about legal side of things here, but a Kaggle dataset would be really cool