Comment by deaux
16 hours ago
Codex: Three reasons. I've used all extensively, for multiple months.
Main one is that it's ~3 times slower. This is the real dealbreaker, not quality. I can guarantee that if tomorrow we woke up and gpt-5.2-codex became the same speed as 4.5-opus without a change in quality, a huge number of people - not HNers but everyone price sensitive - would switch to Codex because it's so much cheaper per usage.
The second one is that it's a little worse at using tools, though 5.2-codex is pretty good at it.
The third is that its knowledge cutoff is further in the past than both Opus 4.5 and Gemini 3 that it's noticeable and annoying when you're working with more recent libraries. This is irrelevant if you're not using those.
For Gemini 3 Pro, it's the same first two reasons as Codex, though the tool calling gap is even much bigger.
Mistral is of course so far removed in quality that it's apples to oranges.
Unpopular opinion but I prefer slow and correct.
My experience on Claude Max (still on it till end-of-month) has been frequent incomplete assignments and troubling decision making. I'll give you an example of each from yesterday.
1. Asked Claude to implement the features in a v2_features.md doc. It completed 8 of 10 but 3 incorrectly. I gave GPT-5.1-Codex-Max (high) the same tasks and it completed 10 of 10 but took perhaps 5-10x as long. Annoyingly, with LLM variability, I can't know for sure if I tried Claude again it would get it correct. The only thing I do know is that GTP-5.2 and 5.1 do a lot more "double-checking" both prior to executing and after.
2. I asked Claude to update a string being displayed in the UI of my app to display something else instead. The string is powered by a json config. Claude searched the code, somehow assumed it was being loaded by a db, did not find the json and opted to write code to overwrite whatever comes out of the 'db' (incorrect) to be what I asked for. This is... not desired behavior and the source of a category of hidden bugs that Claude has created in the past (other models do this as well but less often). Max took its time, found the source json file, and made the update in the correct place.
I can only "sit back and let an agent code" if I trust that it'll do the work right. I don't need it fast, I need it done right. It's already saving me hours where I can do other things in parallel. So, I don't get this argument.
That said, I have a Claude Max and OpenAI Pro subscription and use them both. I instead typically have Claude Opus work on UI and areas where I can visually confirm logic quickly (usually) and Codex in back-end code.
I often wonder how much the complexity of codebases affects how people discuss these models.
Have you tried lower reasoning levels?
Yes and this makes it faster, but still quite a bit slower than Claude Code, and the tool use gap remains. Especially since the comparison for e.g. 5.2 Codex-Low is more like Sonnet than Opus, so that's the speed you're competing with.