Lightweight open source reCaptcha alternative

3 days ago (github.com)

It’s crazy how much of the internet and our app stacks depend on proprietary hosted service integrations that will almost certainly disappear or break in time. Sure it’s convenient to get off the ground with but it doesn’t make sense to me to gate your functionality on a third party that can easily break or slip out from under you. It would be one thing if proprietary software was distributed in a form you could keep operating and using on your own, but even that is obviously inferior to being able to “repair your own equipment”.

  • In the startup world it is a huge economic advantage if you can prototype an idea in days that would have taken months or years. The tradeoffs are acquiring technical debt but we seem capable of resolving that after the concept has found product market fit.

    • Yes, but its not just startups and people do not seem to actually resolve it.

      Lots of big businesses use recaptcha. Quite often unnecessarily. If I need to login with 2FA touse a service does it really need recaptcha?

      Similarly, cloudflare sends you emails telling you how many bots and attacks it has stopped - but you do not know how many false positives there were.

      3 replies →

    • Citation, as they say, is needed.

      As far as I can tell, most startups resolve their technical debt by failing, and the majority of the rest resolve their debt by being acquired by a company which replaces the original service entirely in 1-3 years because it's too hard to integrate as-is.

  • Not only that, but it's also totally acceptable now to broadcast your user's data to a megaton of external services for no good reason. If people had some grasp of what is going on and it was visible to them, they would complain very loudly about it in your face.

  • > It’s crazy how much of the internet and our app stacks depend on proprietary hosted service integrations that will almost certainly disappear or break in time. Sure it’s convenient to get off the ground with but it doesn’t make sense to me to gate your functionality on a third party that can easily break or slip out from under you.

    At least with captchas, it's somewhat understandable with the arms-race aspect. The third party does the work of engaging in the arms race, so you don't have to, but the tradeoff is what you describe.

  • reCaptcha is routinely broken for me. Almost every time I see it I have to solve it about a dozen times, then it decides I’m not human. After 2-3 page refreshes it does let up but it’s frustrating as hell.

    • Are you on Linux by any chance? For some reason this is now deemed 'suspicious' by recaptcha and cloudflare :( Especially if you use Firefox. It's driving me crazy getting bombarded by these.

      2 replies →

The real secret of an effective captcha-like system is to identify/collect lots of data, identify suspicious patterns, validate them (checking what kind of data exposes a bot-like system) and then use this for serving dynamic challenges based on a couple of information.

Example: if the system identifies the user as a bot, it tries to give a less performant solution in terms of PoW.

  • Maybe somebody could explain me why your comment is in different contrast of grey?

    I think somebody might have flagged your comment, but it is a real fact.

    This is one of the reasons why people say cloudflare owns the majority of internet but I think I am okay with that since cloudflare is pretty chill. And they provide the best services but still it just shows that the internet isn't that decentralized.

    But google captcha is literally tracking you IIRC, I would personally prefer hcaptcha if you want centralized solution or anubis if you want to self host (I Prefer anubis I guess)

recaptcha is useless, only annoys actual users. I lost 15 minutes last week on miui site with their trash recaptcha. The point is to steal more data from you

Why call it a CAPTCHA if it is not even trying to tell Computers and Humans Apart (CHA)?

This is only trying to tell human browsers from bot browsers apart. Not even that, it seems all it does is slow all browsers down equally.

  • Because <s>human</s> western society is in its post-competence era. It doesn't matter whether you can do your job, only whether your manager thinks you are, and they don't understand your job so they use all the wrong metrics.

    Like whether there's a checkbox you have to click, and whether it spins for a while when you click it. That's a CAPTCHA now. And working is when your butt is in the chair. And investing is when you give someone money and they promise to give more back tater. And food is things that fit in your mouth and don't kill you. And free speech is when you get turned away at the border for disliking the president on social media. And top-of-the-line CPUs are ones that die within 24 months. Meanwhile the totalitarian dictatorship across the pond actually does all these things better somehow (except the politics). https://en.wikipedia.org/wiki/HyperNormalisation#Etymology

Can someone explain why a robot would not be able to calculate the PoW?

  • It could, the idea is just to tip the economics such that it's not worth it for the bot operator. That kind of abuse typically happens at a vast scale where the cost of solving the challenges adds up fast.

    • Botnets don't even use their own hardware.

      Why would someone renting dirt cheap botnet time care if the requests take a few seconds longer to your site?

      Plus, the requests are still getting through after waiting a few seconds, so it does nothing for the website operator and just burns battery for legit users.

      8 replies →

    • That's definitely the idea.

      So the crazy decentralized mystery botnet(s) that are affecting many of us -- don't seem to be that worried about cost. They are making millions of duplicate requests for duplicate useless content, it's pretty wild.

      On the other hand, they ALSO dont' seem to be running user-agents that execute javascript.

      This is in the findings of a group of some of my colleagues at peer non-profits that have been sharing notes to try to understand what's going on.

      So the fact that they don't run JS at present means that PoW would stop them -- but so would something much simpler and cheaper relying on JS.

      If this becomes popular, could they afford to run JS and to calcualte the PoW?

      It's really unclear. The behavior of these things does not make sense to me enough to have much of a theory about what their cost/benefits or budgets are, it's all a mystery to me.

      Definitely hoping someone manages to figure out who's really behind this and why at some point. (i am definitely not assuming it's a single entity either).

  • I think this being called a "recaptcha alternative" to be slightly misleading.

    There are two problems some website hosters encounter:

    A) How do I ensure no one DDOS (real or inadvertently) me?

    B) How can I ensure this client is actually a human, not a robot?

    Things like ReCaptcha aimed to solve B, not A. But the submitted solution seems to be more for A, as calculating a PoW can be (probably must be actually) calculated by a machine, not a human. While ReCaptcha is supposed to be the opposite, could only be solved by a human.

  • I think the general idea isn't that they can't but that they either won't, because they're not executing JS, or that it would slow them down enough to effectively cripple them.

    • As long as their not executing JS, they don't really need a PoW to stop them, though. Something much simpler that requires executing JS would do.

      i might at any rate set my PoW to be relatively cheap, which would do for anyone not executing JS.

Hmm, how do they know you have calculated the PoW without setting a cookie? Or do you have to calculate it on every page load?

  • Yes, I was wondering what is to stop you replaying the same PoW multiple times. All I can find is:

    To prevent the vulnerability of “replay attacks,” where a client resubmits the same solution multiple times, the server should implement measures that invalidate previously solved challenges.

    The server should maintain a registry of solved challenges and reject any submissions that attempt to reuse a challenge that has already been successfully solved.

    This doesn't seem very scaleable? Or am I missing something?

  • yeah, I need more info to understand what's up.

    Maybe it's only used on individual form submit (like the classic captcha use-case), and not on a page load, and it does have to be recalculated on every form submit?

In AI century, how you would detect a real person or an AI?

  • This thing, despite using "captcha" in its name, is not your typical captcha like hCaptcha or Google's one, because it uses a proof-of-work mechanism instead of writing answers in textboxes/clicking on images/other means of verification requiring user input.

    AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them. This is highlighted by existence of other proof-of-work solutions designed to specifically filter out AI bots, like go-away[1] or Anubis[2].

    And yes, they work - once GNOME deployed one of these proof-of-work challenges on their gitlab instance, traffic on it fell by 97%[3].

    [1] - https://git.gammaspectra.live/git/go-away

    [2] - https://github.com/TecharoHQ/anubis

    [3] - https://thelibre.news/foss-infrastructure-is-under-attack-by...: "According to Bart Piotrowski, in around two hours and a half they received 81k total requests, and out of those only 3% passed Anubi's proof of work, hinting at 97% of the traffic being bots."

    • > AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them. This is highlighted by existence of other proof-of-work solutions designed to specifically filter out AI bots, like go-away[1] or Anubis[2].

      Huh, they definitely can?

      go-away and Anubis reduces the load on your servers as bot operators cannot just scrape N pages per second without any drawbacks. Instead it gets really expensive to make 1000s of requests, as they're all really slow.

      But for a user who uses their own AI agent, that browses the web, things like anubis and go-away aren't meant to (nor does it) stop them from accessing the websites at all, it'll just be a tiny bit slower.

      Those tools are meant to stop site-wide scraping, not individual automatic user-agents.

    • > AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them.

      At least sometimes. I do not know about AI scraping but there are plenty of scraping solutions that do run JS.

      It also puts of some genuine users like me who prefer to keep JS off.

      The 97% is only accurate if you assume a zero false positive rate.

      2 replies →

  • Certainly! Distinguishing between a real person and an AI in the AI century can be tricky, but some key signs include emotional depth, unpredictable creativity, personal experiences, and complex human intuition. AI, on the other hand, tends to rely on data patterns, structured reasoning, and lacks genuine lived experiences.

    • i enjoy that i cannot tell whether this is written by an AI, or by a human pretending to be an AI. my guess is human pretender!

  • AI's scrape data from web pages just like anything else does. I don't think their existence makes a difference.

    • AIs don't. AI companies do.

      Well, maybe. As far as I can see, the overt ones are using pretty reasonable rate limits, even though they're scraping in useless ways (every combination of git hash and file path on gitea). Rather, it seems like he anonymous ones are the problem - and since they're anonymous, we have zero reason to believe they're AI companies. Some of them are running on Huawei Cloud. I doubt OpenAI is using Huawei Cloud.

CAPTCHA stood for “Completely Automated Public Turing Test to tell Computers and Humans Apart”.

By this point, it’s obvious that that has failed, and even that no general solution is possible any more.

ALTCHA… telling Computers and Humans Apart? No, this is proof of work, meaning it’s just about making things expensive—abuse control, not actually distinguishing between computers and humans.

In fact, in https://altcha.org/captcha/ one of the headings is Inclusive to Robots! This is so far the opposite of traditional CAPTCHA, on the technical side, that it’s mildly hilarious. (Socially, they largely amount to the same thing—people never did actually care about computers, just abusive bots.)

Then the question is: what is the proof of work mechanism? How robust are things going to be, and can you ensure attacking will remain expensive, without burdening users too much?

https://altcha.org/docs/proof-of-work/ indicates it’s SHA hashing, not something like scrypt. Uh oh. The best specialised hardware is several million times as good as good laptops¹, let alone cheap phones. If this were to become popular, bots would switch to such hardware, probably making the cost of attacking practically negligible. https://altcha.org/docs/complexity/ shows they’ve thought about these things, but I feel that although it will work for a while, it’s ultimately a doomed game. And in the mean time, you can normally go waaaay simpler and less intrusive: most bots are extremely dumb.

Is “captcha” heading in the direction of meaning “bad rate limiting”?

Because really that’s what this stuff is: rate limiting that trusts that clients don’t have lots of compute power conveniently available, but will get vaporised by powerful and intentional adversaries.

—⁂—

¹ On the https://altcha.org/docs/complexity/ test, a comparatively ideal browser on my 5800HS laptop might reach 500,000 SHA-256 hashes per second at a cost of at least 25W. (Chromium gets half this with ~50% CPU usage; Firefox one tenth, altogether failing to load the cores for some reason.) The most energy-efficient commercial Bitcoin miners seem to be doing around 80 billion of these hashes per watt-second. That’s four million times as good. You cannot bridge such a divide.

The purpose of reCaptcha is to enhance your Google user profile and to deny legitimate users. How does this alternative accomplish those things?

This appears to be a proof-of-work, like Anubis. Real captchas collect much more fingerprinting data to ensure that only users with the latest version of Chrome, the latest version of Windows, and an Nvidia graphics card, can use the site.

  • Yeah, this fails at its most important task, making those filthy, dirty, Firefox users click on bridges for 1 hour a day.

    On topic though, how does this improve on hCaptcha?

    • > On topic though, how does this improve on hCaptcha?

      Cloud vs self-hosted, click annoying things challenge vs automatic proof of work. Or are there other hCaptcha versions and I just never realized it?

    • I know some people who quickly give up and renounce using the service, when they run into hCaptcha puzzles.

      I've been bewildered for some time as well, honestly, it took me a while to figure out the first I ran into.

      And trying one now, fully knowing that I'd have to solve one, I was dumbfounded by the puzzle I've gotten, it took me a few seconds to understand it.

      Cloudflare's ones are horrible and a plague (although they might have slightly improved recently), but I'm not certain I'd prefer hCaptchas over them.

  • I know this is partially a joke, but I’d like to mention that as a Firefox user with uBo and uMatrix, I almost never have to solve challenges with ReCaptcha.

    • how? do you just allow all cookies/scripts/xhr on your umatrix? i'm on a similar config and I get captchas far often than any other users on the same network for some reason.

      2 replies →

    • On the other hand as a user of Firefox I simply cannot pass Cloudflare's verification at all, I always end up in a loop. It's been like that for more than a year... Sometimes it does work on a private window, no idea why or how since I have the same extensions enabled.

      1 reply →