Comment by progx

2 months ago

In AI century, how you would detect a real person or an AI?

9 comments

progx

This thing, despite using "captcha" in its name, is not your typical captcha like hCaptcha or Google's one, because it uses a proof-of-work mechanism instead of writing answers in textboxes/clicking on images/other means of verification requiring user input.

AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them. This is highlighted by existence of other proof-of-work solutions designed to specifically filter out AI bots, like go-away[1] or Anubis[2].

And yes, they work - once GNOME deployed one of these proof-of-work challenges on their gitlab instance, traffic on it fell by 97%[3].

[1] - https://git.gammaspectra.live/git/go-away

[2] - https://github.com/TecharoHQ/anubis

[3] - https://thelibre.news/foss-infrastructure-is-under-attack-by...: "According to Bart Piotrowski, in around two hours and a half they received 81k total requests, and out of those only 3% passed Anubi's proof of work, hinting at 97% of the traffic being bots."

diggan 2 months ago

> AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them. This is highlighted by existence of other proof-of-work solutions designed to specifically filter out AI bots, like go-away[1] or Anubis[2].
Huh, they definitely can?
go-away and Anubis reduces the load on your servers as bot operators cannot just scrape N pages per second without any drawbacks. Instead it gets really expensive to make 1000s of requests, as they're all really slow.
But for a user who uses their own AI agent, that browses the web, things like anubis and go-away aren't meant to (nor does it) stop them from accessing the websites at all, it'll just be a tiny bit slower.
Those tools are meant to stop site-wide scraping, not individual automatic user-agents.
graemep 2 months ago
> AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them.
At least sometimes. I do not know about AI scraping but there are plenty of scraping solutions that do run JS.
It also puts of some genuine users like me who prefer to keep JS off.
The 97% is only accurate if you assume a zero false positive rate.
- ArinaS 2 months ago
  
  > "It also puts of some genuine users like me who prefer to keep JS off."
  Non-javascript challenges are also available[1].
  > "The 97% is only accurate if you assume a zero false positive rate."
  GNOME's gitlab instance is not something people visit daily like Wikipedia, so it's a negligible amount of false positives.
  [1] - https://git.gammaspectra.live/git/go-away/wiki/Challenges#no...
  
  1 reply →

dvh 2 months ago

Certainly! Distinguishing between a real person and an AI in the AI century can be tricky, but some key signs include emotional depth, unpredictable creativity, personal experiences, and complex human intuition. AI, on the other hand, tends to rely on data patterns, structured reasoning, and lacks genuine lived experiences.

igorbark 2 months ago

i enjoy that i cannot tell whether this is written by an AI, or by a human pretending to be an AI. my guess is human pretender!

Jleagle 2 months ago

AI's scrape data from web pages just like anything else does. I don't think their existence makes a difference.

immibis 2 months ago

AIs don't. AI companies do.
Well, maybe. As far as I can see, the overt ones are using pretty reasonable rate limits, even though they're scraping in useless ways (every combination of git hash and file path on gitea). Rather, it seems like he anonymous ones are the problem - and since they're anonymous, we have zero reason to believe they're AI companies. Some of them are running on Huawei Cloud. I doubt OpenAI is using Huawei Cloud.