Comment by diffeomorphism
4 days ago
The robots.txt is pretty explicit that this scraping is "disallowed"
https://www.goodreads.com/robots.txt
So legalities aside, this seems unethical.
4 days ago
The robots.txt is pretty explicit that this scraping is "disallowed"
https://www.goodreads.com/robots.txt
So legalities aside, this seems unethical.
Why would it be unethical?
This obsession with "everything must be commercialized" is really killing creativity.
Now if the author was commercializing other peoples reviews, sure, it's potentially(!) unethical. But scraping a website for reviews that are publicly(!) posted, training a recommendation LLM and then sharing it, for free, seems ... exactly the ideal use case for this technology.
It is truly criminal that such a bright and brilliant model of ethics, Amazon, should endure such an attack.
Unethical behavior does not become good just because it happens to hurt "bad people" (or more accurately, companies bought by bad people).
Using a sword to stab someone is evil, therefore, stabbing someone who is stabbing me with a sword is evil?
1 reply →
I agree. As a frequent reviewer on Goodreads, this feels really icky.
You are right.
At the same time, everything you ever posted online has already been scraped by hundreds (maybe thousands) of entities and distributed/sold to countless other entities. The only difference is that OP shared his project here.
If it's unethical it's not because of what the robots.txt says.
Blindly violating it is bad manners, but deliberately scraping a single website over a month isn't the worst.