Comment by throw219080123

6 days ago

> consider the fact that both Gemini and OpenAI got gold medal level performance

Yet ChatGPT 5 imagines API functions that are not there and cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub.

5 comments

throw219080123

simonw 6 days ago

Which is why you run it in a coding agent loop using something like Codex CLI - then it doesn't matter if it imagines a non-existent function because it will correct itself when it tries to run the code.

Can you expand on "cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub"? I have it do that all the time and it works really well for me (at least with modern "reasoning" models like GPT-5 and Claude 4.)

steveklabnik 6 days ago

As a human, I sometimes write code that does not compile first try. This does not mean that I am stupid, only that I can make mistakes. And the development process has guardrails against me making mistakes, namely, running the compiler.

alickz 6 days ago

Agreed
Infallibility is an unrealistic bar to mark LLMs against

orbital-decay 6 days ago

Yes. I don't see why these have to be mutually exclusive.

buildbot 6 days ago

I feel they are mutually inclusive! I don’t think you can meaningfully create new things if you must always be 100% factually correct, because you might not know what correct is for the new thing.