Comment by somenameforme

12 hours ago

The Turing Test has not been meaningfully passed. Instead we redefined the test to make it passable. In Turing's original concept the competent investigator and participants were all actively expected to collude against the machine. The entire point is that even with collusion, the machine would be able to do the same, and to pass. Instead modern takes have paired incompetent investigators alongside participants colluding with the machine, probably in an effort to be part 'of something historic'.

In "both" (probably more, referencing the two most high profile - Eugene and the LLMs) successes, the interrogators consistently asked pointless questions that had no meaningful chance of providing compelling information - 'How's your day? Do you like psychology? etc' and the participants not only made no effort to make their humanity clear, but often were actively adversarial obviously intentionally answering illogically, inappropriately, or 'computery' to such simple questions. For instance here is dialog from a human in one of the tests:

----

[16:31:08] Judge: don't you thing the imitation game was more interesting before Turing got to it?

[16:32:03] Entity: I don't know. That was a long time ago.

[16:33:32] Judge: so you need to guess if I am male or female

[16:34:21] Entity: you have to be male or female

[16:34:34] Judge: or computer

----

And the tests are typically time constrained by woefully poor typing skills (is this the new normal in the smartphone gen?) to the point that you tend to get anywhere from 1-5 interactions of just several words each. The above snip was a complete interaction, so you get 2 responses from a human trying to trick the judge into deciding he's a computer. And obviously a judge determining that the above was probably a computer says absolutely nothing about the quality of responses from the computer - instead it's some weird anti-Turing Test where humans successfully act like a [bad] computer, ruining the entire point of the test.

The problem with any metric for something is that it often ends up being gamed to be beaten, and this is a perfect example of that. I suspect in a true run of the Turing Test we're still nowhere even remotely close to passing it.

I don't doubt it that all of the formal Turning tests have been badly done. But I suspect that if you did one, at least one run will mis-judge an LLM. Maybe it's a low percentage, but that's vastly better than zero.

So I'd say we're at least "remotely close", which is sufficient for me to reconsider Searle.