← Back to context

Comment by viraptor

2 months ago

> cannot reliably close the gap here

Have you got any proof they're even trying? It's unlikely that's something their real customers are paying for.

I tried to reproduce it again just now, and ChatGPT 5 seems to be a lot more meticulous about running a python script to double-check its work, which it tells me is because it has a warning in its system prompt telling it to. I don't know if that's proof (or even if ChatGPT reliably tells the truth about what's in its system prompt), but given what OpenAI does and doesn't publish it's the closest I could reasonably expect.