Comment by bob1029

24 days ago

I would agree that OAIs GPT-5 family of models is a phase change over GPT-4.

In the ChatGPT product this is not immediately obvious and many people would strongly argue their preference for 4. However, once you introduce several complex tools and make tool calling mandatory, the difference becomes stark.

I've got an agent loop that will fail nearly every time on GPT-4. It works sometimes, but definitely not enough to go to production. GPT-5 with reasoning set to minimal works 100% of the time. $200 worth of tokens and it still hasn't failed to select the proper sequence of tools. It sometimes gets the arguments to the tools incorrect, but it's always holding the right ones now.

I was very skeptical based upon prior experience but flipping between the models makes it clear there has been recent stepwise progress.

I'll probably be $500 deep in tokens before the end of the month. I could barely go $20 before I called bullshit on this stuff last time.

Pretty sure there wasn't extensive training on tooling beforehand. I mean, god, during GPT-3 even getting a reliable json output was a battle and there were dedicated packages for json inference.

Now imagine local models with 95%+ reliable tool calling, you can do insane things when that's the reality.