Comment by js8
14 hours ago
A very human thing to do is - not to tell us which model has failed like this! They are not all alike, some are, what I observe, order of magnitude better at this kind of stuff than others.
I believe how "neurotypical" (for the lack of a better word) you want model to be is a design choice. (But I also believe model traits such as sycophancy, some hallucinations or moral transgressions can be a side effect of training to be subservient. With humans it is similar, they tend to do these things when they are forced to perform.)
Codex in this case. I didn't even think about mentioning it. I'll update the post if it's actually relevant. Which I guess it is.
EDIT: It's specifically GPT-5.4 High in the Codex harness.
weird, for me it was too un-human at first, taking everything literally even if it doesn't make sense; I started being more precise with prompting, to the point where it felt like "metaprogramming in english"
claude on the other hand was exactly as described in the article
Also the exact model/version if you haven't already.
Also, there's no specific examples of what the prompt was and what the result was. Just a big nothingburger
[dead]