Comment by roughly
3 months ago
AI isn't the one closing up shop, it’s the one looting all the stores and taking everything that isn’t bolted down. The AI companies are bad actors that are exploiting the openness of the internet in a fashion that was obviously going to lead to this result - the purpose of these scrapers is to grab everything they can and repackage it into a commercial product which doesn’t return anything to the original source. Of course this was going to break the internet, and people have been warning about that from the first moment these jackasses started - what the hell else was the outcome of all this going to be?
This rings the same tune as the MPAA and RIAA utilizing lawfare to destroy freedom online when pirates were the ones "break[ing] the internet."
Could you help me understand what the difference is between your point and the arguments MPAA and RIAA used to ruin the torrent users' lives they concluded were "thieves"?
As a rule of thumb, do you think people who are happy with the services they contribute content to being open access and wish them to remain so should be the ones who are forced to constantly migrate to new services to keep their content free?
When AI can perfectly replicate the browsing behavior of a human being, should Github restrict viewing a git repository to those who have verified blood biometrics or had their eyes scanned by an Orb? If they make that change, will you still place blame on "jackasses"?
The moral argument in favor of piracy was that it didn’t cost the companies anything and the uses were noncommercial. Neither of those applies to the AI scrapers - they’re aggressively overusing freely-provided services (listen to some of the other folks on this thread about how the scrapers behave) and they’re doing so to create a competing commercial products.
I’m not arguing you shouldn’t be annoyed by these changes, I’m arguing you should be mad at the right people. The scrapers violated the implicit contract of the open internet, and now that’s being made more explicit. GitHub’s not actually a charity, but they’ve been able to provide a free service in exchange for the good will and community that comes along with it driving enough business to cover their costs of providing that service. The scrapers have changed that math, as they did with every other site on the internet in a similar fashion. You can’t loot a store and expect them not to upgrade the locks - as the saying goes, the enemy gets a vote on your strategy, too.
There are plenty of commercial pirates, and those commercial uses were grouped in with noncommercial sharing in much the same way you are doing with scraping. Am I wrong in assuming most of this scraping comes from people utilizing AI agents for things like AI-assisted coding? If an AI agent scrapes a page at a users' request (say the 1 billionth git commit scraped today), do you consider that "loot[ing] a store"? What got looted? Is it the bandwidth? The CPU? Or does this require the assumption that the author of that commit wouldn't be excited that their work is being used?
I'd like to focus on your strongest point, which is the cost to the companies. I would love to know what that increase in cost looks like. You can install nginx on a tiny server and serve 10k rps of static content, or like 50 (not 50k) rps of a random web framework that generates the same content. So this increase in cost must be weighed against how efficient the software serving that content is.
If this Github post included a bunch of numbers and details demonstrating how they have reached the end of the line on optimizing their web frontend, they have ran out of things to cache, and the increase in costs is a real cause for concern to the company (not just a quick shave to the bottom line, not a bigger net/compute check written from Github to their owners), I'd throw my hands up with them and start rallying against the (unquestionably inefficient and on the line of hostile) AI agent scrapers causing the increase in traffic.
Because they did not provide that information, I have to assume that Github and Microsoft are doing this out of pure profit motivations and have abandoned any sense of commitment to open access of software. In fact, they have much to gain from building the walls of their garden up as high as they can get away with, and I'm skeptical their increase in costs is very material at all.
I would rather support services that don't camouflage as open and free software proponents one day and victims of a robbery on the next. I still think this question is important and valid: There is tons of software on Github written by users who wish for their work to remain open access. Is that the class of software and people you believe should be shuffled around into smaller and smaller services that haven't yet abandoned the commitments that allowed them to become popular?
2 replies →
> Could you help me understand what the difference is
Well the main difference is that this is being used to justify blocking and not demanding thousands of dollars.
> When AI can perfectly replicate the browsing behavior of a human being
They're still being jackasses because I'm willing to pay to give free service to X humans but not 20X bots pretending to be humans.