← Back to context

Comment by JohnFen

3 days ago

> "current LLMs basically pass the Turing test" makes me feel like I've secretly been given much worse versions of all the LLMs in some kind of study.

I think you may think passing the Turing test is more difficult and meaningful than it is. Computers have been able to pass the Turing test for longer than genAI has been around. Even Turing thought it wasn't a useful test in reality. He meant it as a thought experiment.

The problem with comparing against humans is which humans? It's a skill issue. You can test a chess bot against grandmasters or random undergrads, but you'll get different results.

The original Turing test is a social game, like the Mafia party game. It's not a game people try to play very often. It's unclear if any bot could win competing against skilled human opponents who have actually practiced and know some tricks for detecting bots.

  • It depends on which version of the Turing test you use. That's largely true of the standard version, but the later version included the human player winning if they were incorrectly identified as a machine.

    The game is much harder if the human player is trying to pretend to be a machine.

I don’t think this is true. Before GPT-2 most people didn’t think the Turing test would be passed any time soon, it’s a quite new development.

I do agree (and I think there is a general consensus) that passing the Turing test is less meaningful than it may seem, it used to be considered an AGI-complete task and this is now clearly not the case.

But I think it’s important to get the attribution right, LLMs were the tech that unexpectedly passed the Turing test.

Having LLMs capable of generating text based on human training data obviously raises the bar for a text-only evaluation of "are you human?", but LLM output is still fairly easy to spot, and knowing what LLMs are capable of (sometimes superhuman), and not capable of, should make it fairly easy for a knowledgeable "turing test administrator" to determine if they are dealing with an LLM or not.

It would be a bit more difficult if you were dealing with an LLM agent tasked with faking a turing test as opposed to a naieve LLM just responding as usual, but even there the LLM will reveal itself by the things that it plain can't do.

  • If you need a specialized skill set (deep knowledge of current LLM limitations) to distinguish between human and machine then I would say the machine passes the turing test.

    • OK, but that's just your own "fool some of the people some of the time" interpretation of what a Turing test should be, and by that measure ELIZA passed the Turing test too, which makes it rather meaningless.

      The intent (it was just a thought experiment) of a Turing test, was that if you can't tell it's not AGI, then it is AGI, which is semi-reasonable, as long as it's not the village idiot administering the test! It was never intended to be "if it can fool some people, some of the time, then it's AGI".

      4 replies →

  • LLM output might be harder to spot when it's mostly commands to drive the browser.

    I often interact with the web all day and don't write any text a human could evaluate.

    • Perhaps, but that's somewhat off topic since that's not what Turing's thought experiment was about.

      However, I'd have to guess that given a reasonable amount of data an LLM vs human interacting with websites would be fairly easy to spot since the LLM would be more purposeful - it'd be trying to fulfill a task, while a human may be curious, distracted by ads, put off by slow response times, etc, etc.

      I don't think it's a very interesting question whether LLMs can sometimes generate output indistinguishable from humans, since that is exactly what they were trained to do - to mimic human-generated training samples. Apropos a Turing test, the question would be can I tell this is not a human, even given a reasonable amount of time to probe it in any way I care ... but I think there is an unspoken assumption that the person administering the test is qualified to do so (else the result isn't about AGI-ability, but rather test administrator ability).

      1 reply →

  • Easy to spot assuming the LLM is not prompted to use a deliberately deceptive response style rather than their "friendly helpful AI assistant" persona. And even then, I've had lots of people swear to me that an emoji laden not this--but that bundle of fluff looks totally like it could have been written by a human.

    • Yes, but there are things that an LLM architecturally just can't do, and LLM-specific failure modes, that would still give it away, even if being instructed to be deceptive would make it a bit harder.

      Obviously as time goes on, and chatbots/AI progress then it'll become harder and harder to distinguish. Eventually we'll have AGI and AGI+ - capable of everything that we can do, including things such as emotional responses, but it'll still be detectable as an alien unless we get to the point of actually emulating a human being in considerable detail as opposed to building an artificial brain with most or all of the same functionality (if not the flavor).

ELIZA was passing the Turing test 50+ years ago. But it's still a valid concept, just not for evaluating some(thing/one) accessing your website.

"Are you an LLM?" poof, fails the Turing test.

Even if they lie, you could ask them 20 times and they d reply the lie, without feeling annoyed: FAIL.

LLMs cannot pass the Turing test, it's easy to see they're not human. They always enjoy questions ! And they never ask any !

  • You're trained to look for LLM-like output. My 70 year old mother is not. She thought cabbage tractor was real until I broke the news to her. It's not her fault either.

    The turning test wasn't meant to be bulletproof, or even quantifiable. It was a thought experiment.

I guess that is where the disconnect is, the issue is that if they mean the trivial thing, then bringing it up as evidence for "it's impossible to solve the problem" doesn't work.