← Back to context

Comment by jillesvangurp

2 hours ago

I've been pretty happy sticking with codex 5.4 medium. I don't see a good case for switching to 5.5 at the cost of going through my token budget quicker.

There are misaligned incentives here between users just trying to get stuff done and AI companies competing on having the "smartest" model that passes benchmarks and continuously does some nobel peace price winning stuff. It's mostly overkill for the more mundane stuff normal people actually do with them. It's nice to have the option when you need that. But defaulting to that is not economical and a bit unnecessary.

There's also a difference between smart models and bigger context windows. Most of the progress in the last year was simply the context windows getting big enough to fit all/most of the stuff needed to solve issues. Before then, you had to carefully manage the context to not run out of space and they wouldn't fit much more than small hobby projects.

With sub agents, the parent agent doesn't need to be a frontier model. It can delegate to smarter agents. And most stuff it delegates shouldn't need a frontier model. Wouldn't it be nice if it could decide on a case by case basis.

The walled gardens offered by OpenAI, Antrhopic, and others currently default to one size fits all "frontier" models. This is not sustainable. They should evolve to using smaller and effective models most of the time with complexity based escalation as needed based on either estimated complexity or when the small models fail. I'm guessing some open source based alternatives to these walled gardens are probably already heading that direction.

The irony here is that with a walled garden, these companies are selling a premium experience. But in the current market that boils down to burning billions of investor cash to keep the GPUs going without much hope on profitability. Eventually surviving companies are going to have to compete on quality, cost and margins. The smart approach would be to dynamically adapt token and context window sizes instead of blindly defaulting everything to the best possible. Don't boil the oceans for a simple email summary or a simple web UI. That stuff already worked well enough with models even a few years ago.

I used to be on 5.4 high for most of my work. I have switched completely to 5.5 medium now. I would highly recommend trying it out

- 5.5 is significantly more token efficient than 5.4 - the same task takes often a third of the tokens

- because of this, is it also much faster to do the task

- you get high "intelligence" per token even after accounting for token efficiency - 5.5 medium is just under 5.4 pro levels of intelligence (imo). It has found tricky bugs for me that all other models failed at

So overall, ideally you will end up with more intelligent, faster model for slightly cheaper.

  • This is embarrassing but I find 5.4-mini on Low covers a substantial part of my and my colleagues work.

    Back when it became expensive I learned to live with it and I find my "AI skills" (mainly communication) have a substantial impact on the efficiency of the model. Not saying my work is difficult, it's not, but I find there is quite a bit of wiggle room. Smaller models can still perform useful work, but you have to do the heavy lifting yourself. It saves a ton of money.

    I used to burn through 75% of my tokens in an hour or two. Now I can work all day and hit maybe 50-60% if I use it heavily.

  • We trialed 5.5 and the same queries produced worse results. Not worth the cost increase. Even if there’s a token efficiency gain the higher cost wipes that out.