Comment by renewiltord
4 days ago
Everyone thinks that their specific pet thing is the precious commons and the other guy is the abuser. But in any case, one should be able to follow the reasoning.
If blocking ads is permissible because the server cannot control the client but can control itself; then so is “scraping”. Both services ask of their clients something they cannot enforce. And both find that the clients refuse.
If you find the justification valid but decide that the conclusion is nonetheless absurd, you must find which step in the reasoning has a failure. The temptation is epicyclic: corporations vs humans or something of the sort; commercial vs non-commercial.
But on its own there is no justification. It’s just that your principles lead you to absurdity but you refuse to revisit them because you like taking from others but you don’t like when others take from you. A fairly simple answer. Nothing for Occam’s Razor to divide.
Particularly believable because the arrival of AI models trained on the world seems to have coincided with some kind of copyright maximalism that this forum has never seen before. Were the advocates of the RIAA simply not users yet?
Or, more believably, is it just that taking feels good but being taken from feels bad?
I don't say this lightly, but I don't think you read my reply or at least didn't understand the implications, especially because you don't actually argue against anything I say. You only say generic statements about justifications and logical conclusions and conclude with assumptions about RIAA.
I stated that the open internet as a whole is the commons, not any specific person's pet project, and thus, that AI scraping (or any bulk scraping done commonly and wholesale) makes it untenable for most people to keep participating. Twitter for example has gone your preferred way, mostly requiring authentication to access. There are many arguments on HN about whether that's a good move, or even a move that others could take and expect success. And that's a huge platform. Just recently there have been front page posts on HN about bringing back personal blogs, and also posts about how personal blogs not behind the great wall of Cloudflare led to TBs of "false" traffic because of scrapers, which costs real money.
I stated I think piracy, ad block, and AI scraping to be part of the same spectrum. I think the justification for ad blocking has a much lower level of burden than the justification for AI scraping to the point you need multiple IPs and argue for whitelisting as the only option to stop it, because of the amount of effect you are having.
Much like how bandwidth has different levels of payment if you use less than 100 MB or more than 1 TB, or how delivering a package that weighs 10 lbs is way cheaper than a package that weighs 1000 lbs, or how at some level of effort times repetition it makes sense to automate something programmatically vs just doing it manually. There are of course situations where each makes sense, but the expectations can vary, and the results are not always linear depending on the inputs. This all completely ignores the social aspect of it that can add a whole new layer of complexity that has it's own logic.
Scraping (or access without ads eg ad blockiing, or outside sharing of data eg piracy) has always been complained about by those that have data that people want to scrape, eg airlines or hbo or disney, it's just that now all data is data that is being scraped absolutely non-stop to the detriment of many and the gain of few that everyone has a reason to complain. It also explains why people have differing opinions.
I think everyone is fine scraping for what is already public. But there’s a lot of scrapers that just do denial of service. Of I have a 1TB of bandwidth from my provider and only 10% of it is consumed usually, it’s really difficult to not blame someone that slurps it up in 1 hour and prevent anyone else from accessing the content.