Comment by wat10000
9 hours ago
"Proper" may be doing some work here, but such a test was run last year and GPT-4.5 and LLaMa-3.1-405B both passed. Oddly, GPT-4.5 was judged as human significantly more often than chance. https://arxiv.org/abs/2503.23674
No comments yet
Contribute on Hacker News ↗