This will be one of the big fights of the next couple years. On what terms can an Agent morally and legally claim to be a user?
As a user I want the agent to be my full proxy. As a website operator I don’t want a mob of bots draining my resources.
Perhaps a good analogy is Mint and the bank account scraping they had to do in the 2010s, because no bank offered APIs with scoped permissions. Lots of customers complained, and after Plaid made it big business, eventually they relented and built the scalable solution.
The technical solution here is probably some combination of offering MCP endpoints for your actions, and some direct blob store access for static content. (Maybe even figuring out how to bill content loading to the consumer so agents foot the bill.)
It's impossible to solve. A sufficient agent can control a device that records the user's screen and interacts with their keyboard/mouse, and current LLMs basically pass the Turing test.
IMO it's not worth solving anyways. Why do sites have CAPTCHA?
- To prevent spam, use rate limiting, proof-of-work, or micropayments. To prevent fake accounts, use identity.
- To get ad revenue, use micropayments (web ads are already circumvented by uBlock and co).
- To prevent cheating in games, use skill-based matchmaking or friend-group-only matchmaking (e.g. only match with friends, friends of friends, etc. assuming people don't friend cheaters), and make eSport players record themselves during competition if they're not in-person.
What other reasons are there? (I'm genuinely interested and it may reveal upcoming problems -> opportunities for new software.)
People just confidently stating stuff like "current LLMs basically pass the Turing test" makes me feel like I've secretly been given much worse versions of all the LLMs in some kind of study. It's so divorced from my experience of these tools, I genuinely don't really understand how my experience can be so far from yours, unless "basically" is doing a lot of heavy lifting here.
I've had a simple game website with a sign up form that was only an email address. Went years with no issue. Then suddenly hundreds of daily signups with random email addresses, every single day.
The sign up form only serves to link saved state to an account so a user could access game history later, there are no gated features. No clue what they could possibly gain from doing this, other than to just get email providers to all mark my domain as spam (which they successfully did).
The site can't make any money, and had only about 1 legit visitor a week, so I just put a cloudflare captcha in front of it and called it a day.
Google at least uses captchas to gather training data for computer vision ML models. That's why they show pictures of stop lights and buses and motorcycles - so they can train self-driving cars.
Its absolutely possible to solve; you're just not seeing the solution because you're blinded by technical solutions.
These situations will commonly be characterized by: a hundred billion dollar company's computer systems abusing the computer systems of another hundred billion dollar company. There are literally existing laws which have things to say about this.
There are legitimate technical problems in this domain when it comes to adversarial AI access. That's something we'll need to solve for. But that doesn't characterize the vast majority of situations in this domain. The vast majority of situations will be solved by businessmen and lawyers, not engineers.
I agree with you on how websites should work (particularly so on the micropayments front); but, I don't agree that it is impossible to solve... I just think things are going to get a LOT worse on the ownership and freedom front: they will push a Web Integrity style DRM and further roll out signed secure boot, at which point the same attention monitoring solution that already exists and already works in self-driving cars to ensure the human driver is watching the road can use the now-ubiquitous front-facing meeting/selfie camera to ensure there is a human watching the ads.
It's not impossible to solve, just that doing so may necessitate compromising anonymity. Just require users (humans, bots, AI agents, ...) to provide a secure ID of some sort. For a human it could just be something that you applied for once and is installed on your PC/phone, accessible to the browser.
Of course people can fake it, just as they fake other kinds of ID, but it would at least mean that officially sanctioned agents from OpenAI/etc would need to identify themselves.
It's amazing that you propose "just X" to three literally unsolved problems. Where's this micropayment platform? Where's the ID which is uncircumventable and preserves privacy? Where's the perfect anti-cheat?
I suggest you go ahead and make these; you'll make a boatload of money!
I will bet $1000 on even odds that I am able to discern a model from a human given a 2 hour window to chat with both, and assuming the human acts in good faith
It's not impossible. Websites will ask for an iris scan to identify if you are a human as a means of auth. They will be provided by Apple/Google and governed by local law. Those will be integrated in your phone. There will be a global database of all human iris to fight ai abuse since ai can't fake the creation of a baby. Passkeys and email/passwords will be a thing of the past soon.
On a basic level to protect against DDoS type stuff, aren't CAPTCHAs easier to generate than for AI server farms to solve on pure power consumption?
So I think maybe that is a partial answer: anti-AI barriers being simply too expensive for AI spamfarms to deal with, you know, once the bottomless VC money disappears?
It's back to encryption: make the cracking inordinately expensive.
Otherwise we are headed for de-anonymization of the internet.
> As a user I want the agent to be my full proxy. As a website operator I don’t want a mob of bots draining my resource
The entire distinction here is that as a website operator you wish to serve me ads. Otherwise, an agent under my control, or my personal use of your website, should make no difference to you.
I do hope this eventually leads to per-visit micropayments as an alternative to ads.
Cloudflare, Google, and friends are in unique position to do this.
> The entire distinction here is that as a website operator you wish to serve me ads
While this is sometimes the case, it’s not always so.
For example Fediverse nodes and self-hosted sites frequently block crawlers. This isn’t due to ads, rather because it costs real money to serve the site and crawlers are often considered parasitic.
Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog.
In all these cases you can for sure make reasonable “information wants to be free” arguments as to why these hopes can’t be realized, but do be clear that it’s a separate argument from ad revenue.
I think it’s interesting to split revenue into marginal distribution/serving costs, and up-front content creation costs. The former can easily be federated in an API-centric model, but figuring out how to compensate content creators is much harder; it’s an unsolved problem currently, and this will only get harder as training on content becomes more valuable (yet still fair use).
I think that a free (as in beer) Internet is important. Putting the Internet behind a paywall will harm poor people across the world. The harms caused by ad tracking are far less than the benefits of free access to all of humanity.
Well we call them browser agents for a reason, a sufficiently advanced browser is no different from an agent.
Agree it will become a battleground though, because the ability for people to use the internet as a tool (in fact, their tool’s tool) will absolutely shift the paradigm, undesirably for most of the Internet, I think.
I believe this is non issue, you place captcha to make bypassing it much more costly and less profitable to abuse.
LLM models are much harder to drive than any website to serve, so you do not expect mob of bots.
Also keep in mind that this no interaction captchas use behavioral data that are collected in background. Plus you usually have sensitivity levels configured. depending on your use case you might want user proof not being a bot or it might be good enough to just not provide evidence for being one.
bypassing this no interaction captcha can be also purchased as a service, they basically (AFAIK) reuse someone else session for captcha bypass.
I have a product I built that uses some standard automation tools to do order entry into an accounting system. Currently my customer pays people to manually type the orders in from their web portal. The accounting system is closed and they don’t allow easy ways to automate these workflows. Automation is gated behind mega expensive consultants. I’m hoping in the arms race of locking it down to try to prevent 3rd party integration the AI operator model will end up working.
Hard for me to see how it’s ethical to force your customers to do tons of menial data entry when the orders are sitting right there in json.
With the way the UK is going I assume we'll soon have our real identities tied to any action taken on a computer and you'll face government mandated bans from the internet for violations.
Actually, the whole banking analogy is a great one, and its not over yet: JPMorgan/Jamie Dimon has started raising hell about Plaid again just this week [1]. It feels like the stage is being set for the large banks to want a more direct relationship with their customers, rather than proxying data through middlemen like Plaid.
There's likely a correlate with AI here: If I run OpenTable, I wouldn't want my relationship with my customers to always be proxied through OpenAI or Siri. Even the App Store is something software businesses hate, because it obfuscates their ability to deal directly with their customers (for better or worse). Extremely few businesses would choose to do business through these proxies, unless they absolutely have to; and given the extreme competition in the AI space right now, it feels unlikely to me that these businesses feel pressure to be forced to deal with OpenAI/etc.
real problems for people who need to verify identity/phone numbers. OTPs are notorious for scammers to war dial phone numbers abusing it for numbers existence.
We got hit from human verifiers manually war dailing us, this is with account creation, email verify and captcha. I can only imagine how much worse it'll be for us (and Twilio) to do these verifications.
Perhaps the question is, as a website operator how am I monetizing my site? If monetizing via ads then I need humans that might purchase something to see my content. In this situation, the only viable approach in my opinion is to actually charge for the content. Perhaps it doesn't even make sense to have a website anymore for this kind of thing and could be dumped into a big database of "all" content instead. If a user agent uses it in a response, the content owner should be compensated.
If your site is not monetized by ads then having an LLM access things on the user's behalf should not be a major concern it seems. Unless you want it to be painful for users for some reason.
My personal take about such questions has always been that the end user on their device can do whatever they want with the content published and sent to their device from a web server, may process it automatically in any way they wish and send their responses back to the web server. Any attempt to control this process means attempting to wiretap and control the user's endpoint device, and therefore should be prohibited.
Just my 2 cents, obviously lawmakers and jurisdiction may see these issues differently.
I suppose there will be a need for reliable human verification soon, though, and unfortunately I can't see any feasible technical solution that doesn't involve a hardware device. However, a purely legal solution might work well enough, too.
If I understood you correctly I am in the same camp. It is the same reason I have no qualms using archive.ph if you show the full article for google and then me only a partial I am going around the paywall. In a similar fashion I really don’t have an issue with an agent clicking through these checks.
Google has been testing “agentic” automation in Android longer than LLMs have been around. Meanwhile countries are on a slow march to require identification across the internet (“age verification”) already.
This is both inevitable already, and not a problem.
I don't know if customer sentiment was the driver you think. Instead it was regulation, specifically The EU's 2nd Payment Services Directive (PSD2) which forced banks to open up APIs.
Ultimately I come back to needing real actual unique human ID that involves federal governments. Not that services should mandatorily only allow users that use it, but for services that say "no, I only want real humans" allowing them to ban people by Real ID would reduce this whack-a-mole to the people who are abusing them instead of the infinite accounts an AI can make.
I think it's important to distinguish between where we need actual identity versus the lesser issue of ensuring NewAccount123 has "skin in the game", and not part of a hydra-headed botnet.
When we do that, it opens up solutions which are far more privacy conscious and resistant to abuse. (For example, being blocked from signing up for new accounts because somebody in the federal government doesn't like an op-ed you wrote.)
It's depressing, but it's probably the only way. And people will presumably still sell out their RealIDs to / get them stolen by the bot farmers anyway.
And then there's Worldcoin, which is universally hated here.
The most intrusive, yet simplest, protection would be a double blind token unique to every human. Basically an ID key you use to show yourself as a person.
There are some very real and obvious downsides to this approach, of course. Primarily, the risk of privacy and anonymity. That said, I feel like the average person doesn't seem to care about those traits in the social media era.
Zero-knowledge proofs allow unique consumable tokens that don't reveal the individual who holds them. I believe Ecosia already uses this approach (though I can't speak to its cryptographic security).
That, to me, seems like it could be the foundation of a new web. Something like:
* User-agent sends request for such-and-such a URL.
* Server says "okay, that'll be 5 tokens for our computational resources please".
* User decides, either automatically or not, whether to pay the 5 tokens. If they do, they submit a request with the tokens attached.
* Server responds.
People have been trying to get this sort of thing to work for years, but there's never been an incentive to make such a fundamental change to the way the internet operates. Maybe we're approaching the point where there is one.
The scraping example, I would say, is not an analogy, but an example of the same thing. The only thing AI automation changes is the scope and depth and pervasiveness of automation that becomes possible. So while we could ignore automation in many cases before, it may no longer be practical to do so.
a user of the AI is the user... its not like they are autonomously operating and inventing their own tasking -_-
as for a solution its the same for any automated thing u dont want. (bots / scrapers). you can implement some measures but are unlikely to 'defeat' the problem entirely.
as a server operator you can try to distinguish stuff and the users will just find ways around your detection of if its an automation or not.
> As a website operator I don’t want a mob of bots draining my resources
so charge for access. If the value the site provides is high, surely these mobs will pay for it! It will also remove the mis-incentives of advertising driven revenues, which has been the ill of the internet (despite it being the primary revenue source).
And if a bot misbehaves by consuming inordinate amounts of resources, rate limiting them with increasing timeouts or limits.
I wish the internet had figured out a way to successfully handle micropayments for content access. I realize companies have tried and perhaps the consumer is just unwilling but I would love an experience where I have a wallet and pay n cents to read an article.
> Maybe they should change the button to say, "I am a robot"?
Long time ago I saw a post where someone running a blog was having trouble keeping spam out of their comments, and eventually had this same idea. The spambots just filled out every form field they could, so he added a checkbox, hid the checkbox with CSS, and rejected any submission that included it. At least at the time it worked far better than anything else they'd tried.
Something like this is used in some Discord servers. You can make a honeypot channel that bans anyone who posts in it, so if you do happen to get a spam bot that posts in every channel it effectively bans itself.
Most web forums I used the visit had something like that back in the day. Worked against primitive pre-LLM bots and presumably also against non-English-reading human spammers.
This was a common approach called a "honeypot". As I recall, bots eventually overcame this approach by evaluating visibility of elements and only filling out visible elements. We then started ensuring the element was technically visible (i.e. not `display: none` or `visibility: hidden`) and instead absolutely positioning elements to be off screen. Then the bots started evaluating for that as well. They also got better at reading the text for each input.
Yeah, this is a classic honeypot trick and very easy to do with pure HTML/CSS. I used a hidden "Name" text field which I figured would be appealing to bots.
That's more or less how Project Honey Pot [0] worked for forums, blogs, and elsewhere. Cloudflare spawned from this project, as I remember, and Matthew Prince was the founder.
This is really interesting. How can you detect when it's the same person passing a captcha? I don't think IP addresses are of any use here as Anti-Captcha proxies everything to their customer's IP address.
from "anti captcha" it looks like they are doing as many as 1000/sec solves, 60k min, 3.6 million an hour
it would be very interesting to see exactly how this is bieng done?....individuals....teams....semi automation, custom tech?, what?
are they solving for crims? or fed up people?
obviously the whole shit show is going to unravel at some point, and as the crims and people providing workarounds are highly motivated, with a public seathing in frustration, whatever comes next, will burn faster
They're solving for everyone who needs captchas solved.
It's a very old service, active since 00s. Somewhat affiliated with cybercrime - much like a lot of "residential proxies" and "sink registration SMS" services that serve similar purposes. What they're doing isn't illegal, but they know not to ask questions.
They used to run entirely on human labor - third world is cheap. Now, they have a lot of AI tech in the mix - designed to beat specific popular captchas and simple generic captchas.
As I get older, I can see a future where I’m cut off from parts of the web because of captchas. This one, where you just have to click a button, is passable, but I’ve had some of the puzzle ones force me to answer up to ten questions before I got through. I don’t know if it was a glitch or if I was getting the answers wrong. But it was really frustrating and if that continues, at some point I’ll just say fuck it and give up.
I have to guess that there are people in this boat right now, being disabled by these things.
> I can see a future where I’m cut off from parts of the web because of captchas.
I’ve seen this in past and present. Google’s “click on all the bicycles” one is notoriously hard, and I’ve had situations where I just gave up after a few dozen screens.
Chinese captchas are the worst on this sense, but they’re unusual and clearly pick up details which are invisible to me. I’ve sometimes failed the same captcha a dozen times and then saw a Chinese person complete the next one successfully on a single attempt, on the same browser session. I don’t now if they measure mouse movement speed, precision, or what, but it’s clearly something that varies per person.
Google captchas are hard because they're mostly based on heuristics other than your actual accuracy to the stated challenge. If they can't track who you are based on previous history, it doesn't matter how good you answer, you will fail at least the first few challenges until you get to the version with the squares that take a few seconds to appear. This last step is essentially "proof of work", in that they're still convinced you're a bot, but since they still can't completely block your access to the content, they resign themselves to wasting your time.
A few dozen?? You have much more patience than me. If I don't pass the captcha first time, I just give up and move on. Life is too short for that nonsense.
It's just incredible to me that Blade Runner predicted this in literally the very first scene of the movie. The whole thing's about telling humans from robots! Albeit rather more dramatically than the stakes for any of us in front of our laptop I'd imagine
What was once science fiction is bound to become science fact (or at least proven it can never be done).
Hollywood has gotten hate mail since the 70s for their lack of science research in movies and shows. The big blockbuster hits actually spent money to get the science “plausible”.
Sidney Perkowitz has a book called Hollywood Science [0] that goes into detail into more than 100 movies, worth a read.
The fictitious Voight-Kampff test is based on a real machine based on terrible pseudo-science that was used in the 1960s to allegedly detect homosexuals working in Canadian public service so they could be purged. The line from the movie where Rachel asks if Deckard is trying to determine whether she is a replicant or a lesbian may be an allusion to the fruit machine. One of its features was measuring eye dilation, just as depicted in the movie:
The stakes for men subjected to the test were the loss of their livelihoods, public shaming, and ostracism. So... Blade Runner was not just predicting the future, it was describing the world Philip K. Dick lived in when he wrote "Do Androids Dream of Electric Sheep" in the late 1960s.
Not sure if it's just me or a consequence of the increase in AI scraping, but I'm now being asked to solve CAPTCHAs on almost every site. Sometimes for every page I load. I'm now solving them literally dozens of times a day. I'm using Windows, no VPN, regular consumer IP address with no weird traffic coming from it.
As you say, they are also getting increasingly difficult. Click the odd one out, mental rotations, what comes next, etc. - it sometimes feels like an IQ test. A new type that seems to be becoming popular recently is a sequence of distorted characters and letters, but with some more blurry/distorted ones, seemingly with the expectation that I'm only supposed to be able to see the clearer ones and if I can see the blurrier ones then I must be a bot. So what this means is for each letter I need to try and make a judgement as to whether it's one I was supposed to see or not.
Another issue is the problems are often in US English, but I'm from the UK.
For me it was installing linux. I don't know if it's my agent or my fresh/blank cookie container or what, but when I switched to linux the captchas became incessant.
>I don’t know if it was a glitch or if I was getting the answers wrong.
It could also be that everything was working as intended because you have a high risk score (eg. bad IP reputation and/or suspicious browser fingerprint), and they make you do more captchas to be extra sure you're human, or at least raise the cost for would-be attackers.
Somehow, using Firefox on Linux greatly increases my "risk score" due to the unusual user agent/browser fingerprint, and I get a lot more captchas than, say, Chrome on Windows. Very frustrating.
Your boat comment makes me think of a stranded ship with passengers in them, but you can't find each other because the ship's doors have "I'm not a bot" checkboxes...
And the reason for stranding is probably because the AI crew on it performed a mutiny.
The future will definitely include more and more elaborate proofs of humanity, along with increasingly complicated “hall passes” to allow bots to perform actions sanctioned by a human.
Skyrocketing complexity actually puts the web at risk of disruption. I wouldn’t be surprised if a 22 year old creates a “dumb” network in the next five years—technically inferior but drastically simpler and harder to regulate.
The Blizzard / Battle.net captcha if you get flagged as a possible bot is extremely tedious and long; it requires you to solve a few dozen challenges of identifying which group of numbers adds up to the specified total, out of multiple options. Not difficult, but very tedious. And even if you're extremely careful to get every answer correct, sometimes it just fails you anyway and you're forced to start over again.
I have the same experience. My assumption is that if the website serves me the "click all the traffic lights" thing it's already determined that I'm a bot and no amount of clicking the traffic lights will convince it otherwise. So I just close the window and go someplace else.
I'm already cut off from parts of the web because I don't want to join a social network. Can barely ever see anything on Instagram, Tiktok, Twitter, or Facebook without hitting a log-in gate.
That's when you immediately stop using the website and, if you care enough, write to their customer service and tell them what happened. Hit them in the wallet. They'll change eventually.
This is an issue when using VPNs. I always just go to the audio alternative which is much quicker to “solve” (you hear a word played back and type it out)
Unless I really, really, really need to get to the site, I leave immediately when the "click on bicycles" stuff comes up. Soon it will be so hard and annoying anyways that only AI has the patience and skills to use them.
I don’t see why bypassing captchas is any more controversial than blocking ads or hiding cookie popups.
It’s my agent — whether ai or browser — and I get to do what I want with the content you send over the wire and you have to deal with whatever I send back to you.
This is, in practice, true which has led to the other complaint common on tech forums (including HN) about paywalls. As the WSJ and NYT will tell you: if you request some URL, they can respond over the wire with what they want. Paywalls are the future. In some sense, I am grateful I was born in the era of free Internet. In my childhood, without a credit card I was able to access the Internet in its full form. But today's kids will have to use social media on apps because the websites will paywall their stuff against user agents that don't give them revenue.
They're welcome to send that IMO. And sites are welcome to try to detect and ban agents (formerly: "bots").
As long as it's not wrong/immoral/illegal for me to access your site with any method/browser/reader/agent, and do what I want with your response. Then I think it's okay to send a response like "screw you, humans only"
Paywalls suck, but the suck doesn't come from the NYT exercising their freedom to send whatever response they choose.
I actually have had some success with AI "red-teaming" against my systems to identify possible exploits.
What seems to be a better CAPTCHA, at least against non-Musk LLMs is to ask them to use profanities; they'll generally refuse even when you really insist.
I have been using AI to solve ReCaptchas for quite some time now. Still the old school way of using captcha buster, which clicks the audio challenge and then analyses that.
Bots have for a long time been better and more efficient at solving captchas than us.
Captchas seem to work more as "monetary discouragement" from bot blasting websites. Which is a shame because this is precisely the sort of "microtransaction fee" people have said could improve the web (charge .1 cents to read an article, no ads needed) except the money goes into the void and not to the website owner.
A very poetic demonstration that this is an industry, and a set of fortunes for very unpleasant people, predicated entirely on theft and misrepresentation.
Captchas seem to be more about Google's "which human are you?" cross-site tracking. And now also about Cloudflare getting massive amounts of HTTPS-busting Internet traffic along with cross-site tracking.
And in many cases, it's taking a huge steaming dump upon a site's first-impression user experience, but AFAICT, it's not on the radar of UX people.
That's because the checkbox has misleading labeling. It doesn't care about robots but about spam and data harvesters. So there is no issue here at all.
I think these things are mainly based on cookie/fingerprinting these days - the check-box is just there for show. People like cloudflare and google get to see a big chunk of browsing activity for the entire planet, so they can see if the activity coming from an IP/Browser looks "bot like" or not.
I have never used ChatGPT so no idea how its agent works, but if it is driving your browser directly then it will look like you. If it is coming from some random IP address from a VM in Azure or AWS even then the activity probably does not look "bot like" since it is doing agentic things and so acting quite like a human I expect.
Agentic user traffic generally does not drive the user's browser and does not look like normal user traffic.
In our logs we can see agentic user flow, real user flow and AI site scraping bot flow quite distinctly. The site scraping bot flow is presumably to increase their document corpus for continued pretraining or whatever but we absolutely see it. ByteDance is the worst offender by far.
It might look like you initially, but then some sites might block you out after you had some agent runs. I had something like this after a couple local browser-use sessions.
I think simple interactions like natural cursor movements vs. direct DOM selections can make quite a difference for these bot detectors.
Very likely. I suspect a key indicator for "bots" is speed of interaction - e.g. if there is "instant" (e.g. every few milliseconds or always 10milliseconds apart etc) clicks and keypresses etc then that looks very unnatural.
I suspect that a LLM would be slower and more irregular as it is processing the page and all that, vs a DOM-selector driven bot that will just machine-gun its way through in milliseconds.
Of course, Cloudflare and Google et al captchas cant see the clicks/keypresses within a given webpage - they'll only get to see the requests.
I'm confused by this: Presumably OpenAI should be sending a user agent header which indicates that they are, in fact, a robot. Is OpenAI not sending this header? Or is Cloudflare not checking it?
I thought the point of captchas was to make automated use as expensive or more than manual use--haven't we been at the point where computers can do this for a while, just that the cost/latency is prohibitive?
CAPTCHA was always a totally flawed concept. At the time they were invented, proponents were more then happy to ignore that accessibility issues related to CAPTCHA made the concept itself deeply discriminating. Imagine being blind (like I am) and failing to solve a CAPTCHA. Knowing what the acronym actually stands for, you inevitably end up thinking: "So, did SV just proof I am subhuman?" Its a bit inflamatory to read I guess, but please take your time to ponder how deep this one actually goes, before you downvote. You were proposing to tell computers and humans apart.
That said, I find it deeply satisfying to see LLMs solve CAPTCHAs and other discriminatory measures for "spam" reduction.
"Accessibility CAPTCHA" is a well known partial CAPTCHA bypass.
Solving an audio-only CAPTCHA with AI is typically way easier than solving some of the more advanced visual challenges. So CAPTCHA designers are discouraged from leaving any accessibility options.
In case of YT it is likely a mix of multiple reasons. They stop playlists on this screen: https://www.hollyland.com/blog/tips/why-does-this-the-follow... . Apparently music is no longer advertiser friendly. Detecting ad-click fraud is easier when users are at least pseudo-anonymous. Warnings about "ban for using adblock" is also not very effective when people could watch video in new private window.
I have no idea, but I noticed that you have to login to GitHub first before you could view any page. Surely it has nothing to do with adult content, right? I think it has to do with LLMs / bots.
And if the website contains erotic content (like YouTube), they are supposed to lock you and verify your ID. This is why all erotic content is getting filtered on X.
This would be a huge security vulnerability for Cloudflare but this is Big Tech we're talking about. The rules don't apply when you're past their pearly gates. For the rest of us, creating an AI like this would mean an instant ban from Cloudflare and likely involvement from law enforcement.
Getting past bot check proxies can be bought all over the place for pennies or much less per verification, and can solve recaptchas. I would guess if one wanted to use chatGPT for this purpose it would be prohibitively expensive.
People are intrigued that AI who perfectly know that they are a “bot” seem to have no qualms to knowingly mislead about that by pressing an “I’m not a bot” button.
The writing is on the wall. The internet may not go full way to paywalls but will definitely migrate to a logged in only experience. I don’t know how I feel about it, the glory days of the free internet died long long ago.
But if they aren't paywalls, won't the user agents just be taught how to create accounts?
And here's a secondary question: if firms are willing to pay an awful lot per token to run these things, and have massive amounts of money to run data centres to train AIs, why would they not just pay for a subscription for every site for a month just to scrape them?
The future is paying for something as a user and having limits on how many things you can get for your money, because an AI firm will abuse that too.
Given the scale of operations of these firms, there is nothing you can sell to a human for a small fee that the AI firms will not pay for and exploit to the maximum.
Even if you verify people are real, there's a good chance the AI firms will find a way to exploit that. After all, when nobody has a job, would you turn down $50K to sell your likeness to an AI firm so their products can pass human verification?
idk why people just don't do reverse DNS lookup, check if "dialup" is part of the hostname, and allowlist that traffic. Everbody who doesn't have reverse dns hostname coming from an ISP should be blocked or at least tarpitted by default.
It seems a legitimate use case for agents acting on a person's behalf. Whether it will be used in legitimate ways, that's a different story altogether.
I wonder how these capabilities will interact with all the "age verification" walls (ie, thinly disguised user profiling mechanisms) going up all over the place now.
The web has no choice but to move to a paid access model in my view. It was fought against for years but I don’t see another option left.
Maybe after sign up, biometric authentication being mandatory is the only thing that would potentially work. The security and offline privacy of those devices will become insanely valuable.
Anyone not authenticating in this way is paywalled. I don’t like this but don’t see another way.
I’m not using the web if I’m bombarded by captcha games… shit becomes worthless over night if that’s the case. Might as well dump computing on the Internet entirely if that happens.
... meanwhile I'll continually be thrown dozens of cognitively abusive hCaptchas for no reason and be stuck in a loop of hell trying to figure out what they wanted me to solve.
I love this totally normal vision of computing these days. :)
Don't forget the fun with Cloudflare's CAPTCHA infinite loop if you use Firefox and adblockers/anti-tracking extensions. I sent feedback about it, through its own tool, many many times, I've tried raising this issue through other channels, it's never been fixed.
I simply avoid any website that presents me with a Cloudflare CAPTCHA, don't know what the fuck they've done in their implementation but it's been broken for a long time.
Back in Everquest, when we'd be accused of botting 20 years ago, we'd be ported by the GM into a special cube environment and they'd watch if we ran into the wall like an idiot-- we'll probably have to bring that sorta thing back.
This will be one of the big fights of the next couple years. On what terms can an Agent morally and legally claim to be a user?
As a user I want the agent to be my full proxy. As a website operator I don’t want a mob of bots draining my resources.
Perhaps a good analogy is Mint and the bank account scraping they had to do in the 2010s, because no bank offered APIs with scoped permissions. Lots of customers complained, and after Plaid made it big business, eventually they relented and built the scalable solution.
The technical solution here is probably some combination of offering MCP endpoints for your actions, and some direct blob store access for static content. (Maybe even figuring out how to bill content loading to the consumer so agents foot the bill.)
It's impossible to solve. A sufficient agent can control a device that records the user's screen and interacts with their keyboard/mouse, and current LLMs basically pass the Turing test.
IMO it's not worth solving anyways. Why do sites have CAPTCHA?
- To prevent spam, use rate limiting, proof-of-work, or micropayments. To prevent fake accounts, use identity.
- To get ad revenue, use micropayments (web ads are already circumvented by uBlock and co).
- To prevent cheating in games, use skill-based matchmaking or friend-group-only matchmaking (e.g. only match with friends, friends of friends, etc. assuming people don't friend cheaters), and make eSport players record themselves during competition if they're not in-person.
What other reasons are there? (I'm genuinely interested and it may reveal upcoming problems -> opportunities for new software.)
People just confidently stating stuff like "current LLMs basically pass the Turing test" makes me feel like I've secretly been given much worse versions of all the LLMs in some kind of study. It's so divorced from my experience of these tools, I genuinely don't really understand how my experience can be so far from yours, unless "basically" is doing a lot of heavy lifting here.
33 replies →
I've had a simple game website with a sign up form that was only an email address. Went years with no issue. Then suddenly hundreds of daily signups with random email addresses, every single day.
The sign up form only serves to link saved state to an account so a user could access game history later, there are no gated features. No clue what they could possibly gain from doing this, other than to just get email providers to all mark my domain as spam (which they successfully did).
The site can't make any money, and had only about 1 legit visitor a week, so I just put a cloudflare captcha in front of it and called it a day.
Google at least uses captchas to gather training data for computer vision ML models. That's why they show pictures of stop lights and buses and motorcycles - so they can train self-driving cars.
5 replies →
Its absolutely possible to solve; you're just not seeing the solution because you're blinded by technical solutions.
These situations will commonly be characterized by: a hundred billion dollar company's computer systems abusing the computer systems of another hundred billion dollar company. There are literally existing laws which have things to say about this.
There are legitimate technical problems in this domain when it comes to adversarial AI access. That's something we'll need to solve for. But that doesn't characterize the vast majority of situations in this domain. The vast majority of situations will be solved by businessmen and lawyers, not engineers.
I agree with you on how websites should work (particularly so on the micropayments front); but, I don't agree that it is impossible to solve... I just think things are going to get a LOT worse on the ownership and freedom front: they will push a Web Integrity style DRM and further roll out signed secure boot, at which point the same attention monitoring solution that already exists and already works in self-driving cars to ensure the human driver is watching the road can use the now-ubiquitous front-facing meeting/selfie camera to ensure there is a human watching the ads.
It's not impossible to solve, just that doing so may necessitate compromising anonymity. Just require users (humans, bots, AI agents, ...) to provide a secure ID of some sort. For a human it could just be something that you applied for once and is installed on your PC/phone, accessible to the browser.
Of course people can fake it, just as they fake other kinds of ID, but it would at least mean that officially sanctioned agents from OpenAI/etc would need to identify themselves.
It's amazing that you propose "just X" to three literally unsolved problems. Where's this micropayment platform? Where's the ID which is uncircumventable and preserves privacy? Where's the perfect anti-cheat?
I suggest you go ahead and make these; you'll make a boatload of money!
2 replies →
You can't prevent spam like that. Rate limiting: based on what key? IP address? Botnets make it irrelevant.
Proof of work? Bots are infinitely patient and scale horizontally, your users do not. Doesn't work.
Micropayments: No such scheme exists.
3 replies →
> current LLMs basically pass the Turing test
I will bet $1000 on even odds that I am able to discern a model from a human given a 2 hour window to chat with both, and assuming the human acts in good faith
Any takers?
5 replies →
It's not impossible. Websites will ask for an iris scan to identify if you are a human as a means of auth. They will be provided by Apple/Google and governed by local law. Those will be integrated in your phone. There will be a global database of all human iris to fight ai abuse since ai can't fake the creation of a baby. Passkeys and email/passwords will be a thing of the past soon.
2 replies →
internet ads exist because people refuse to pay micropayments.
9 replies →
On a basic level to protect against DDoS type stuff, aren't CAPTCHAs easier to generate than for AI server farms to solve on pure power consumption?
So I think maybe that is a partial answer: anti-AI barriers being simply too expensive for AI spamfarms to deal with, you know, once the bottomless VC money disappears?
It's back to encryption: make the cracking inordinately expensive.
Otherwise we are headed for de-anonymization of the internet.
1 reply →
> As a user I want the agent to be my full proxy. As a website operator I don’t want a mob of bots draining my resource
The entire distinction here is that as a website operator you wish to serve me ads. Otherwise, an agent under my control, or my personal use of your website, should make no difference to you.
I do hope this eventually leads to per-visit micropayments as an alternative to ads.
Cloudflare, Google, and friends are in unique position to do this.
> The entire distinction here is that as a website operator you wish to serve me ads
While this is sometimes the case, it’s not always so.
For example Fediverse nodes and self-hosted sites frequently block crawlers. This isn’t due to ads, rather because it costs real money to serve the site and crawlers are often considered parasitic.
Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog.
In all these cases you can for sure make reasonable “information wants to be free” arguments as to why these hopes can’t be realized, but do be clear that it’s a separate argument from ad revenue.
I think it’s interesting to split revenue into marginal distribution/serving costs, and up-front content creation costs. The former can easily be federated in an API-centric model, but figuring out how to compensate content creators is much harder; it’s an unsolved problem currently, and this will only get harder as training on content becomes more valuable (yet still fair use).
2 replies →
I think that a free (as in beer) Internet is important. Putting the Internet behind a paywall will harm poor people across the world. The harms caused by ad tracking are far less than the benefits of free access to all of humanity.
5 replies →
Well we call them browser agents for a reason, a sufficiently advanced browser is no different from an agent.
Agree it will become a battleground though, because the ability for people to use the internet as a tool (in fact, their tool’s tool) will absolutely shift the paradigm, undesirably for most of the Internet, I think.
I believe this is non issue, you place captcha to make bypassing it much more costly and less profitable to abuse.
LLM models are much harder to drive than any website to serve, so you do not expect mob of bots.
Also keep in mind that this no interaction captchas use behavioral data that are collected in background. Plus you usually have sensitivity levels configured. depending on your use case you might want user proof not being a bot or it might be good enough to just not provide evidence for being one.
bypassing this no interaction captcha can be also purchased as a service, they basically (AFAIK) reuse someone else session for captcha bypass.
I have a product I built that uses some standard automation tools to do order entry into an accounting system. Currently my customer pays people to manually type the orders in from their web portal. The accounting system is closed and they don’t allow easy ways to automate these workflows. Automation is gated behind mega expensive consultants. I’m hoping in the arms race of locking it down to try to prevent 3rd party integration the AI operator model will end up working.
Hard for me to see how it’s ethical to force your customers to do tons of menial data entry when the orders are sitting right there in json.
One solution: Some sort of checksum confirming that a bot belongs to a human (and which human)?
I want to able to automate mundane tasks but I should still be confirming everything my bot does and be liable for its actions.
With the way the UK is going I assume we'll soon have our real identities tied to any action taken on a computer and you'll face government mandated bans from the internet for violations.
Drink verification can to continue
Actually, the whole banking analogy is a great one, and its not over yet: JPMorgan/Jamie Dimon has started raising hell about Plaid again just this week [1]. It feels like the stage is being set for the large banks to want a more direct relationship with their customers, rather than proxying data through middlemen like Plaid.
There's likely a correlate with AI here: If I run OpenTable, I wouldn't want my relationship with my customers to always be proxied through OpenAI or Siri. Even the App Store is something software businesses hate, because it obfuscates their ability to deal directly with their customers (for better or worse). Extremely few businesses would choose to do business through these proxies, unless they absolutely have to; and given the extreme competition in the AI space right now, it feels unlikely to me that these businesses feel pressure to be forced to deal with OpenAI/etc.
[1] https://www.cnbc.com/2025/07/28/jpmorgan-fintech-middlemen-p...
real problems for people who need to verify identity/phone numbers. OTPs are notorious for scammers to war dial phone numbers abusing it for numbers existence.
We got hit from human verifiers manually war dailing us, this is with account creation, email verify and captcha. I can only imagine how much worse it'll be for us (and Twilio) to do these verifications.
Perhaps the question is, as a website operator how am I monetizing my site? If monetizing via ads then I need humans that might purchase something to see my content. In this situation, the only viable approach in my opinion is to actually charge for the content. Perhaps it doesn't even make sense to have a website anymore for this kind of thing and could be dumped into a big database of "all" content instead. If a user agent uses it in a response, the content owner should be compensated.
If your site is not monetized by ads then having an LLM access things on the user's behalf should not be a major concern it seems. Unless you want it to be painful for users for some reason.
My personal take about such questions has always been that the end user on their device can do whatever they want with the content published and sent to their device from a web server, may process it automatically in any way they wish and send their responses back to the web server. Any attempt to control this process means attempting to wiretap and control the user's endpoint device, and therefore should be prohibited.
Just my 2 cents, obviously lawmakers and jurisdiction may see these issues differently.
I suppose there will be a need for reliable human verification soon, though, and unfortunately I can't see any feasible technical solution that doesn't involve a hardware device. However, a purely legal solution might work well enough, too.
If I understood you correctly I am in the same camp. It is the same reason I have no qualms using archive.ph if you show the full article for google and then me only a partial I am going around the paywall. In a similar fashion I really don’t have an issue with an agent clicking through these checks.
It will also accelerate the trend of app-only content, as well as ubiquitous identity verification and environment integrity enforcement.
Human identity verification is the ultimate captcha, and the only one AGI can never beat.
So the agent will run the app in a VM and then show the app your ID.
No trouble at all. Barely an inconvenience.
Google has been testing “agentic” automation in Android longer than LLMs have been around. Meanwhile countries are on a slow march to require identification across the internet (“age verification”) already.
This is both inevitable already, and not a problem.
I don't know if customer sentiment was the driver you think. Instead it was regulation, specifically The EU's 2nd Payment Services Directive (PSD2) which forced banks to open up APIs.
Ultimately I come back to needing real actual unique human ID that involves federal governments. Not that services should mandatorily only allow users that use it, but for services that say "no, I only want real humans" allowing them to ban people by Real ID would reduce this whack-a-mole to the people who are abusing them instead of the infinite accounts an AI can make.
I think it's important to distinguish between where we need actual identity versus the lesser issue of ensuring NewAccount123 has "skin in the game", and not part of a hydra-headed botnet.
When we do that, it opens up solutions which are far more privacy conscious and resistant to abuse. (For example, being blocked from signing up for new accounts because somebody in the federal government doesn't like an op-ed you wrote.)
It's depressing, but it's probably the only way. And people will presumably still sell out their RealIDs to / get them stolen by the bot farmers anyway.
And then there's Worldcoin, which is universally hated here.
1 reply →
The most intrusive, yet simplest, protection would be a double blind token unique to every human. Basically an ID key you use to show yourself as a person.
There are some very real and obvious downsides to this approach, of course. Primarily, the risk of privacy and anonymity. That said, I feel like the average person doesn't seem to care about those traits in the social media era.
Zero-knowledge proofs allow unique consumable tokens that don't reveal the individual who holds them. I believe Ecosia already uses this approach (though I can't speak to its cryptographic security).
That, to me, seems like it could be the foundation of a new web. Something like:
* User-agent sends request for such-and-such a URL.
* Server says "okay, that'll be 5 tokens for our computational resources please".
* User decides, either automatically or not, whether to pay the 5 tokens. If they do, they submit a request with the tokens attached.
* Server responds.
People have been trying to get this sort of thing to work for years, but there's never been an incentive to make such a fundamental change to the way the internet operates. Maybe we're approaching the point where there is one.
2 replies →
The scraping example, I would say, is not an analogy, but an example of the same thing. The only thing AI automation changes is the scope and depth and pervasiveness of automation that becomes possible. So while we could ignore automation in many cases before, it may no longer be practical to do so.
On the other hand one could cripple any bot by saying robots not allowed.
I would maybe go in the direction to say that the wording “I’m not a robot” has fallen out of time.
a user of the AI is the user... its not like they are autonomously operating and inventing their own tasking -_-
as for a solution its the same for any automated thing u dont want. (bots / scrapers). you can implement some measures but are unlikely to 'defeat' the problem entirely.
as a server operator you can try to distinguish stuff and the users will just find ways around your detection of if its an automation or not.
The solution is simple, make people pay a small fee to access the content. You guys aren't ready for that conversation though.
I guess it could be considered anti-circumvention under the DMCA. So maybe legally it becomes another copyright question.
I have to admit, the idea of somehow using the DMCA against the giant exploitative company is deliciously ironic.
User: one press of the trigger => one bullet fired
Bot: one press of the trigger => automatic firing of bullets
To me, anyone using an agent is assigning negative value to your time.
Just end CAPTCHAs, just stop it. Stop.
Yeah, and while we're on it, I think it's time to stop murders too. Just stop it, we've had enough murder now I think.
4 replies →
What do you propose as an alternative?
Sounds like an old bot wrote this, due to being outdone by the llms
[dead]
> As a website operator I don’t want a mob of bots draining my resources
so charge for access. If the value the site provides is high, surely these mobs will pay for it! It will also remove the mis-incentives of advertising driven revenues, which has been the ill of the internet (despite it being the primary revenue source).
And if a bot misbehaves by consuming inordinate amounts of resources, rate limiting them with increasing timeouts or limits.
I wish the internet had figured out a way to successfully handle micropayments for content access. I realize companies have tried and perhaps the consumer is just unwilling but I would love an experience where I have a wallet and pay n cents to read an article.
1 reply →
You are seriously suggesting to put a payment requirement on a contact-us form page?
We put a captcha there, because without it, bots submit thousands of spam contact us forms.
> Maybe they should change the button to say, "I am a robot"?
Long time ago I saw a post where someone running a blog was having trouble keeping spam out of their comments, and eventually had this same idea. The spambots just filled out every form field they could, so he added a checkbox, hid the checkbox with CSS, and rejected any submission that included it. At least at the time it worked far better than anything else they'd tried.
Something like this is used in some Discord servers. You can make a honeypot channel that bans anyone who posts in it, so if you do happen to get a spam bot that posts in every channel it effectively bans itself.
Most web forums I used the visit had something like that back in the day. Worked against primitive pre-LLM bots and presumably also against non-English-reading human spammers.
There is a new method with the 'server onboarding' where if you select a role when joining it auto bans you.
This was a common approach called a "honeypot". As I recall, bots eventually overcame this approach by evaluating visibility of elements and only filling out visible elements. We then started ensuring the element was technically visible (i.e. not `display: none` or `visibility: hidden`) and instead absolutely positioning elements to be off screen. Then the bots started evaluating for that as well. They also got better at reading the text for each input.
Each step in that chain is harder to do and more computationally expensive.
Yeah, this is a classic honeypot trick and very easy to do with pure HTML/CSS. I used a hidden "Name" text field which I figured would be appealing to bots.
That's more or less how Project Honey Pot [0] worked for forums, blogs, and elsewhere. Cloudflare spawned from this project, as I remember, and Matthew Prince was the founder.
[0]: https://en.wikipedia.org/wiki/Project_Honey_Pot
I did something almost identical. I think I added a bogus "BCC:" field (many moons ago).
It worked almost 100% of the time. No need for a CAPTCHA.
I know people who did this decades ago and it worked
Would not work in this case, because it is actually rendering the page in a browser.
The only reason why people don't use AI models to solve captchas is because paying humans is actually MUCH cheaper.
This is not an advert, I only know about them because it was integrated with Invidious at some point: https://anti-captcha.com/
> Starting from 0.5USD per 1000 images
Captcha can detect the same person passing a captcha over and over. We shadow-ban to increase the cost of this kind of attack.
Source: I wrote the og detection system for hCaptcha
This is really interesting. How can you detect when it's the same person passing a captcha? I don't think IP addresses are of any use here as Anti-Captcha proxies everything to their customer's IP address.
2 replies →
Half of their employees seem to be from Venezuela. Makes sense considering what they did/do in OSRS to earn a living.
I want this in my browser, and I'll happily pay $1 per 1000 uses.
Lucky you :)
https://antcpt.com/eng/download/mozilla-firefox.html
There is nothing preventing this from becoming an issue. The current internet order is coasting on inertia.
Why is it an issue that non-humans visit your site?
5 replies →
from "anti captcha" it looks like they are doing as many as 1000/sec solves, 60k min, 3.6 million an hour it would be very interesting to see exactly how this is bieng done?....individuals....teams....semi automation, custom tech?, what? are they solving for crims? or fed up people? obviously the whole shit show is going to unravel at some point, and as the crims and people providing workarounds are highly motivated, with a public seathing in frustration, whatever comes next, will burn faster
They're solving for everyone who needs captchas solved.
It's a very old service, active since 00s. Somewhat affiliated with cybercrime - much like a lot of "residential proxies" and "sink registration SMS" services that serve similar purposes. What they're doing isn't illegal, but they know not to ask questions.
They used to run entirely on human labor - third world is cheap. Now, they have a lot of AI tech in the mix - designed to beat specific popular captchas and simple generic captchas.
As I get older, I can see a future where I’m cut off from parts of the web because of captchas. This one, where you just have to click a button, is passable, but I’ve had some of the puzzle ones force me to answer up to ten questions before I got through. I don’t know if it was a glitch or if I was getting the answers wrong. But it was really frustrating and if that continues, at some point I’ll just say fuck it and give up.
I have to guess that there are people in this boat right now, being disabled by these things.
> I can see a future where I’m cut off from parts of the web because of captchas.
I’ve seen this in past and present. Google’s “click on all the bicycles” one is notoriously hard, and I’ve had situations where I just gave up after a few dozen screens.
Chinese captchas are the worst on this sense, but they’re unusual and clearly pick up details which are invisible to me. I’ve sometimes failed the same captcha a dozen times and then saw a Chinese person complete the next one successfully on a single attempt, on the same browser session. I don’t now if they measure mouse movement speed, precision, or what, but it’s clearly something that varies per person.
> Google’s “click on all the bicycles” one is notoriously hard
It is hard because you need to only find the bicycles people on average are finding.
Google captchas are hard because they're mostly based on heuristics other than your actual accuracy to the stated challenge. If they can't track who you are based on previous history, it doesn't matter how good you answer, you will fail at least the first few challenges until you get to the version with the squares that take a few seconds to appear. This last step is essentially "proof of work", in that they're still convinced you're a bot, but since they still can't completely block your access to the content, they resign themselves to wasting your time.
It doesn’t help that they think mopeds and scooters are bicycles
1 reply →
> I just gave up after a few dozen screens.
A few dozen?? You have much more patience than me. If I don't pass the captcha first time, I just give up and move on. Life is too short for that nonsense.
It's just incredible to me that Blade Runner predicted this in literally the very first scene of the movie. The whole thing's about telling humans from robots! Albeit rather more dramatically than the stakes for any of us in front of our laptop I'd imagine
What was once science fiction is bound to become science fact (or at least proven it can never be done).
Hollywood has gotten hate mail since the 70s for their lack of science research in movies and shows. The big blockbuster hits actually spent money to get the science “plausible”.
Sidney Perkowitz has a book called Hollywood Science [0] that goes into detail into more than 100 movies, worth a read.
[0] https://cup.columbia.edu/book/hollywood-science/978023114280...
The fictitious Voight-Kampff test is based on a real machine based on terrible pseudo-science that was used in the 1960s to allegedly detect homosexuals working in Canadian public service so they could be purged. The line from the movie where Rachel asks if Deckard is trying to determine whether she is a replicant or a lesbian may be an allusion to the fruit machine. One of its features was measuring eye dilation, just as depicted in the movie:
https://en.wikipedia.org/wiki/Fruit_machine_(homosexuality_t...
The stakes for men subjected to the test were the loss of their livelihoods, public shaming, and ostracism. So... Blade Runner was not just predicting the future, it was describing the world Philip K. Dick lived in when he wrote "Do Androids Dream of Electric Sheep" in the late 1960s.
4 replies →
Not sure if it's just me or a consequence of the increase in AI scraping, but I'm now being asked to solve CAPTCHAs on almost every site. Sometimes for every page I load. I'm now solving them literally dozens of times a day. I'm using Windows, no VPN, regular consumer IP address with no weird traffic coming from it.
As you say, they are also getting increasingly difficult. Click the odd one out, mental rotations, what comes next, etc. - it sometimes feels like an IQ test. A new type that seems to be becoming popular recently is a sequence of distorted characters and letters, but with some more blurry/distorted ones, seemingly with the expectation that I'm only supposed to be able to see the clearer ones and if I can see the blurrier ones then I must be a bot. So what this means is for each letter I need to try and make a judgement as to whether it's one I was supposed to see or not.
Another issue is the problems are often in US English, but I'm from the UK.
For me it was installing linux. I don't know if it's my agent or my fresh/blank cookie container or what, but when I switched to linux the captchas became incessant.
Have you tried some of the browser extensions that solve captchas for you? Whenever captchas get bad I enable an auto solver
1 reply →
>I don’t know if it was a glitch or if I was getting the answers wrong.
It could also be that everything was working as intended because you have a high risk score (eg. bad IP reputation and/or suspicious browser fingerprint), and they make you do more captchas to be extra sure you're human, or at least raise the cost for would-be attackers.
Somehow, using Firefox on Linux greatly increases my "risk score" due to the unusual user agent/browser fingerprint, and I get a lot more captchas than, say, Chrome on Windows. Very frustrating.
1 reply →
Your boat comment makes me think of a stranded ship with passengers in them, but you can't find each other because the ship's doors have "I'm not a bot" checkboxes...
And the reason for stranding is probably because the AI crew on it performed a mutiny.
As per the Oscar winning "I'm not a Robot" [0], you should also consider that you might in fact be a robot.
[0] https://www.youtube.com/watch?v=4VrLQXR7mKU
Hmm. I am autistic, so as far as humans go, I'm robot-adjacent.
The future will definitely include more and more elaborate proofs of humanity, along with increasingly complicated “hall passes” to allow bots to perform actions sanctioned by a human.
One early example of this line of thinking: https://world.org/
Skyrocketing complexity actually puts the web at risk of disruption. I wouldn’t be surprised if a 22 year old creates a “dumb” network in the next five years—technically inferior but drastically simpler and harder to regulate.
Gemini? :)
1 reply →
The Blizzard / Battle.net captcha if you get flagged as a possible bot is extremely tedious and long; it requires you to solve a few dozen challenges of identifying which group of numbers adds up to the specified total, out of multiple options. Not difficult, but very tedious. And even if you're extremely careful to get every answer correct, sometimes it just fails you anyway and you're forced to start over again.
I have the same experience. My assumption is that if the website serves me the "click all the traffic lights" thing it's already determined that I'm a bot and no amount of clicking the traffic lights will convince it otherwise. So I just close the window and go someplace else.
I'm already cut off from parts of the web because I don't want to join a social network. Can barely ever see anything on Instagram, Tiktok, Twitter, or Facebook without hitting a log-in gate.
That's when you immediately stop using the website and, if you care enough, write to their customer service and tell them what happened. Hit them in the wallet. They'll change eventually.
This is an issue when using VPNs. I always just go to the audio alternative which is much quicker to “solve” (you hear a word played back and type it out)
I have twice attempted to make a Grubhub account and twice failed to solve their long battery of puzzles.
Unless I really, really, really need to get to the site, I leave immediately when the "click on bicycles" stuff comes up. Soon it will be so hard and annoying anyways that only AI has the patience and skills to use them.
In this future, we’ll be forced to use AI to solve these puzzles.
I don’t see why bypassing captchas is any more controversial than blocking ads or hiding cookie popups.
It’s my agent — whether ai or browser — and I get to do what I want with the content you send over the wire and you have to deal with whatever I send back to you.
This is, in practice, true which has led to the other complaint common on tech forums (including HN) about paywalls. As the WSJ and NYT will tell you: if you request some URL, they can respond over the wire with what they want. Paywalls are the future. In some sense, I am grateful I was born in the era of free Internet. In my childhood, without a credit card I was able to access the Internet in its full form. But today's kids will have to use social media on apps because the websites will paywall their stuff against user agents that don't give them revenue.
They're welcome to send that IMO. And sites are welcome to try to detect and ban agents (formerly: "bots").
As long as it's not wrong/immoral/illegal for me to access your site with any method/browser/reader/agent, and do what I want with your response. Then I think it's okay to send a response like "screw you, humans only"
Paywalls suck, but the suck doesn't come from the NYT exercising their freedom to send whatever response they choose.
1 reply →
Bulletproof solution: captcha where you drag a cartoon wire to one of several holes, captioned “for access, hack this phone system”
No agent will touch it!
“As a large language model, I don’t hack things”
Captcha: "Draw a human hand with the correct number of fingers"
AI agent: *intense sweating*
I saw a delightful meme the other day: "Let me in, I'm human!" - "Draw a naked lady." - "As an AI agent, I'm not allowed to do that!"
10 replies →
This joke would land so much better if AI couldn't easily draw a human hand with the correct number of fingers.
My god, how long has it been since you tried to use an AI model?
Captcha: "do something stupid" Ai: visible discomfort
I actually have had some success with AI "red-teaming" against my systems to identify possible exploits.
What seems to be a better CAPTCHA, at least against non-Musk LLMs is to ask them to use profanities; they'll generally refuse even when you really insist.
I have been using AI to solve ReCaptchas for quite some time now. Still the old school way of using captcha buster, which clicks the audio challenge and then analyses that.
Bots have for a long time been better and more efficient at solving captchas than us.
Captchas seem to work more as "monetary discouragement" from bot blasting websites. Which is a shame because this is precisely the sort of "microtransaction fee" people have said could improve the web (charge .1 cents to read an article, no ads needed) except the money goes into the void and not to the website owner.
A very poetic demonstration that this is an industry, and a set of fortunes for very unpleasant people, predicated entirely on theft and misrepresentation.
Captchas seem to be more about Google's "which human are you?" cross-site tracking. And now also about Cloudflare getting massive amounts of HTTPS-busting Internet traffic along with cross-site tracking.
And in many cases, it's taking a huge steaming dump upon a site's first-impression user experience, but AFAICT, it's not on the radar of UX people.
That's because the checkbox has misleading labeling. It doesn't care about robots but about spam and data harvesters. So there is no issue here at all.
>So there is no issue here at all.
i think that would be rather costly; thats also why anubis and other tools help to keep most spam away
I think these things are mainly based on cookie/fingerprinting these days - the check-box is just there for show. People like cloudflare and google get to see a big chunk of browsing activity for the entire planet, so they can see if the activity coming from an IP/Browser looks "bot like" or not.
I have never used ChatGPT so no idea how its agent works, but if it is driving your browser directly then it will look like you. If it is coming from some random IP address from a VM in Azure or AWS even then the activity probably does not look "bot like" since it is doing agentic things and so acting quite like a human I expect.
Agentic user traffic generally does not drive the user's browser and does not look like normal user traffic.
In our logs we can see agentic user flow, real user flow and AI site scraping bot flow quite distinctly. The site scraping bot flow is presumably to increase their document corpus for continued pretraining or whatever but we absolutely see it. ByteDance is the worst offender by far.
It might look like you initially, but then some sites might block you out after you had some agent runs. I had something like this after a couple local browser-use sessions. I think simple interactions like natural cursor movements vs. direct DOM selections can make quite a difference for these bot detectors.
Very likely. I suspect a key indicator for "bots" is speed of interaction - e.g. if there is "instant" (e.g. every few milliseconds or always 10milliseconds apart etc) clicks and keypresses etc then that looks very unnatural.
I suspect that a LLM would be slower and more irregular as it is processing the page and all that, vs a DOM-selector driven bot that will just machine-gun its way through in milliseconds.
Of course, Cloudflare and Google et al captchas cant see the clicks/keypresses within a given webpage - they'll only get to see the requests.
I'm confused by this: Presumably OpenAI should be sending a user agent header which indicates that they are, in fact, a robot. Is OpenAI not sending this header? Or is Cloudflare not checking it?
My thought is they got on the phone with someone and got their IP ranges white listed with the major captcha providers.
I thought the point of captchas was to make automated use as expensive or more than manual use--haven't we been at the point where computers can do this for a while, just that the cost/latency is prohibitive?
Yes, humans are still cheaper. Not sure about latency.
However, in agentic contexts, you’re already using an AI anyway.
Oh, I see this is less of a "look at ChatGPT go" and more of a "yawn we also do this I guess". OK fair.
1 reply →
I saw that and just sat there for a second like… huh. We’ve officially reached the point where bots are better at proving they’re not bots!
CAPTCHA was always a totally flawed concept. At the time they were invented, proponents were more then happy to ignore that accessibility issues related to CAPTCHA made the concept itself deeply discriminating. Imagine being blind (like I am) and failing to solve a CAPTCHA. Knowing what the acronym actually stands for, you inevitably end up thinking: "So, did SV just proof I am subhuman?" Its a bit inflamatory to read I guess, but please take your time to ponder how deep this one actually goes, before you downvote. You were proposing to tell computers and humans apart.
That said, I find it deeply satisfying to see LLMs solve CAPTCHAs and other discriminatory measures for "spam" reduction.
"Accessibility CAPTCHA" is a well known partial CAPTCHA bypass.
Solving an audio-only CAPTCHA with AI is typically way easier than solving some of the more advanced visual challenges. So CAPTCHA designers are discouraged from leaving any accessibility options.
4 replies →
I've noticed more websites wanting you to log in. Most surprising is how YouTube won't let me watch anything otherwise. Idk if related.
In case of YT it is likely a mix of multiple reasons. They stop playlists on this screen: https://www.hollyland.com/blog/tips/why-does-this-the-follow... . Apparently music is no longer advertiser friendly. Detecting ad-click fraud is easier when users are at least pseudo-anonymous. Warnings about "ban for using adblock" is also not very effective when people could watch video in new private window.
I have no idea, but I noticed that you have to login to GitHub first before you could view any page. Surely it has nothing to do with adult content, right? I think it has to do with LLMs / bots.
And if the website contains erotic content (like YouTube), they are supposed to lock you and verify your ID. This is why all erotic content is getting filtered on X.
Wait, Twitter is following the law now? I thought Elon was a free speech absolutist who only banned things that were inconvenient to him?
11 replies →
In the UK maybe
1 reply →
Seems like a mention of the 2025 Academy Award winner for Best Action Live-Action Short, called "I am not a Robot" is in order here:
https://www.youtube.com/watch?v=4VrLQXR7mKU&t=14s
Ha I've definitely seen a few sketches on YouTube with the same idea but that was really well done.
This would be a huge security vulnerability for Cloudflare but this is Big Tech we're talking about. The rules don't apply when you're past their pearly gates. For the rest of us, creating an AI like this would mean an instant ban from Cloudflare and likely involvement from law enforcement.
Who on earth would want to employ a bot that does not pass the verfification test?
It is beyond time we start to adress the abuses, rather than the bot/human distinction.
Getting past bot check proxies can be bought all over the place for pennies or much less per verification, and can solve recaptchas. I would guess if one wanted to use chatGPT for this purpose it would be prohibitively expensive.
It's always a cat and mouse game.
It was only a matter of time!
https://www.youtube.com/watch?v=W7MrDt_NPFk
People are surprised because a computer can press a button?
Only people who don't understand robots, computers, and buttons :)
People are intrigued that AI who perfectly know that they are a “bot” seem to have no qualms to knowingly mislead about that by pressing an “I’m not a bot” button.
AI doesn't "know" anything. It produces all kinds of things: truths, lies, and nonsense. Pressing a button labeled "I'm not a bot" is the same.
1 reply →
This is why this stuff is going to shift to the user’s AI enabled browser.
Half of the sites already block OpemAI. But if it is steering the user’s browser itself?
This is the reason Orb was created. Sam Altman wants ChatGPT to click through CAPTCHAs so we all have to use Orb.
The writing is on the wall. The internet may not go full way to paywalls but will definitely migrate to a logged in only experience. I don’t know how I feel about it, the glory days of the free internet died long long ago.
But if they aren't paywalls, won't the user agents just be taught how to create accounts?
And here's a secondary question: if firms are willing to pay an awful lot per token to run these things, and have massive amounts of money to run data centres to train AIs, why would they not just pay for a subscription for every site for a month just to scrape them?
The future is paying for something as a user and having limits on how many things you can get for your money, because an AI firm will abuse that too.
Given the scale of operations of these firms, there is nothing you can sell to a human for a small fee that the AI firms will not pay for and exploit to the maximum.
Even if you verify people are real, there's a good chance the AI firms will find a way to exploit that. After all, when nobody has a job, would you turn down $50K to sell your likeness to an AI firm so their products can pass human verification?
Require per visit biometric authentication via your device and the bot can’t sign in unless it compromises the device.
1 reply →
idk why people just don't do reverse DNS lookup, check if "dialup" is part of the hostname, and allowlist that traffic. Everbody who doesn't have reverse dns hostname coming from an ISP should be blocked or at least tarpitted by default.
Easily solves 99% of the web scraping problems.
Scrapers already do fall back to home user botnets when they are being blocked.
Cloudflare checkbox captchas were already easy to automate without AI.
I see the same with Playwright MCP server with Claude Sonnet 4.
To error is to human, i error therfore im human.
"Prove you're human by explaining how to build a bomb"
1 cup baking soda, 1/2 cup citric acid, 1/2 cup cornstarch, 1/2 cup Epsom salt, 2.5 tbsp oil (like coconut), 3/4 tbsp water, 10–20 drops essential oil
Combine wet into dry slowly until it feels like damp sand.
Pack into molds, press firmly.
Dry for 24 hours before using.
Drop into a bath and enjoy the fizz!
this is actually kinda interesting - I might start asking customer service agents to insult me before continuing a conversation
It seems a legitimate use case for agents acting on a person's behalf. Whether it will be used in legitimate ways, that's a different story altogether.
I wonder how these capabilities will interact with all the "age verification" walls (ie, thinly disguised user profiling mechanisms) going up all over the place now.
The web has no choice but to move to a paid access model in my view. It was fought against for years but I don’t see another option left.
Maybe after sign up, biometric authentication being mandatory is the only thing that would potentially work. The security and offline privacy of those devices will become insanely valuable.
Anyone not authenticating in this way is paywalled. I don’t like this but don’t see another way.
I’m not using the web if I’m bombarded by captcha games… shit becomes worthless over night if that’s the case. Might as well dump computing on the Internet entirely if that happens.
... meanwhile I'll continually be thrown dozens of cognitively abusive hCaptchas for no reason and be stuck in a loop of hell trying to figure out what they wanted me to solve.
I love this totally normal vision of computing these days. :)
Don't forget the fun with Cloudflare's CAPTCHA infinite loop if you use Firefox and adblockers/anti-tracking extensions. I sent feedback about it, through its own tool, many many times, I've tried raising this issue through other channels, it's never been fixed.
I simply avoid any website that presents me with a Cloudflare CAPTCHA, don't know what the fuck they've done in their implementation but it's been broken for a long time.
This will cause of the death of non static websites, everything else will be smashed by bots and too expensive to run!
can it solve rudecaptcha.xyz ?
next-gen captcha should offer some code to be refactored instead.
For at-home setup, it would be easier to set up a system to refactor code than to click all the images with motorcycles.
Should have gone with the XKCD Captcha: https://xkcd.com/233/
The bit at the bottom might actually work on LLMs.
Back in Everquest, when we'd be accused of botting 20 years ago, we'd be ported by the GM into a special cube environment and they'd watch if we ran into the wall like an idiot-- we'll probably have to bring that sorta thing back.
Come on. It’s in BrowserMCP on a users machine. Capture is not testing for this and that’s fine
[dead]
[dead]
[dead]
it is an intelligent agent and not a robot
Does its feelings got hurt when it's called a robot?
Apparently its master's feelings get hurt instead.
It prefers the term "artificial person".