← Back to context

Comment by DougBTX

2 months ago

> Are the AI companies really living in an echo chamber?

The author tested 12 models, and only one was consistently wrong. More than half were correct 100% of the time.

A better conclusion would be that there’s something in particular wrong with GPT-5 Chat, all the other GPT 5 variants are OK. I wonder what’s different?

This “one model” was supposed to be the state-of-the-art model, from the company that had a model named “strawberry” in honor of the same problem. And other models struggle with this extremely basic question too.