Comment by wat10000

4 hours ago

In one study, GPT-4.5 was judged to be human 73% of the time, which means that the actual human was judged to be human only 27% of the time. More human than human, as Tyrell would say.

Edit: folks, the standard Turing test involves a computer and a human, and then a judge communicating with both and giving a verdict about which one is the human. The percentages for the two entities being judged will add up to exactly 100%. That's how this test was conducted. Please don't assume I'm a moron.

7 comments

wat10000

dwpdwpdwpdwpdwp 4 hours ago

The implication would be that GPT-4.5 was not judged to be human 27% of the time. You can't determine how often humans were judged correctly as humans from that data point.

jmalicki 3 hours ago

The structure of the test was that there was one human and one AI conversation partner, and the rater had to choose which one was which.
Given that structure, you can judge from that data point.

jmalicki 3 hours ago

That was also before the crazy AI hysteria we have today with the em-dash police everywhere.

wat10000 9 minutes ago

For the test to be free of bias, we’ll have to ensure all the humans are from Nigeria.

Melatonic 4 hours ago

Those stats dont necessarily line up that way. Do you have a link?

jmalicki 3 hours ago
Given the way the test was structured it does line up.
https://arxiv.org/abs/2503.23674
- Melatonic 3 hours ago
  
  Surprisingly good. I wonder how they would have done without the 5 minute limit on conversations (average of 8 messages per convo per the study)