Comment by pogue
3 months ago
I assume they're trying to keep ai bots from strip mining the whole place.
Or maybe your IP/browser is questionable.
3 months ago
I assume they're trying to keep ai bots from strip mining the whole place.
Or maybe your IP/browser is questionable.
What's being strip mined is the openness of the Internet, and AI isn't the one closing up shop. Github was created to collaborate on and share source code. The company in the best position to maximize access to free and open software is now just a dragon guarding other people's coins.
The future is a .txt file of John Carmack pointing out how efficient software used to be, locked behind a repeating WAF captcha, forever.
AI isn't the one closing up shop, it’s the one looting all the stores and taking everything that isn’t bolted down. The AI companies are bad actors that are exploiting the openness of the internet in a fashion that was obviously going to lead to this result - the purpose of these scrapers is to grab everything they can and repackage it into a commercial product which doesn’t return anything to the original source. Of course this was going to break the internet, and people have been warning about that from the first moment these jackasses started - what the hell else was the outcome of all this going to be?
This rings the same tune as the MPAA and RIAA utilizing lawfare to destroy freedom online when pirates were the ones "break[ing] the internet."
Could you help me understand what the difference is between your point and the arguments MPAA and RIAA used to ruin the torrent users' lives they concluded were "thieves"?
As a rule of thumb, do you think people who are happy with the services they contribute content to being open access and wish them to remain so should be the ones who are forced to constantly migrate to new services to keep their content free?
When AI can perfectly replicate the browsing behavior of a human being, should Github restrict viewing a git repository to those who have verified blood biometrics or had their eyes scanned by an Orb? If they make that change, will you still place blame on "jackasses"?
5 replies →
Free and open source software is on GitHub, but AI- and other crawlers do not respect the licenses. As someone who writes a lot of code under specific FOSS licenses, I welcome any change that makes it harder for machines to take my code and just steal it
I encountered this on github last week. Very agressive rate limiting. My browser and IP is very ordinary.
Since Microsoft is struggling to make ends meet, maybe they could throw a captcha or proof of work like Anubis by xe iaso.
They already disabled code search for unauthenticated users. Its totally plausible they will disable code browsing as well.
That hit me, too. I thought it was an accidental bug and didn’t realize it was actually malice.
Just sign in if it's an issue for your usage.
My usage isn't high. I was rate limited to like 5 requests per minute. It was a repo with several small files.
And seriously if they keep this up, with limits on their web interface but leave unauthenticated cloning allowed, I'd rather clone the repo than log in.
GitHub code browsing went south since microsoft bought them anyway. Having a simple proxy that clones a repo and serves it would solve problems with rate limits and their awful UX.
> Or maybe your IP/browser is questionable.
I'm using Firefox and Brave on Linux from a residential internet provider in Europe and the 429 error triggers consistantly on both browsers. Not sure I would consider my setup questionable considering their target audience.
I’m browsing from an iPhone in Europe right now and can browse source code just fine without being logged in.
Then it means they're looking at the User-Agent string and determining that an iPhone in Europe most likely has a human using it, and might not require rate-limiting.
*other ai bots, ms will obviously mine anything on there.
Personally, I like sourcehut (sr.ht)
Same way Reddit sells all its content to Google, then stops everyone else from getting it. Same way Stack Overflow sells all its content to Google, then stops everyone else from getting it.
(Joke's on Reddit, though, because Reddit content became pretty worthless since they did this, and everything before they did this was already publicly archived)
Other bots or MS bots too?