← Back to context

Comment by syntaxing

1 day ago

Serious question, why is codex and mistral(vibe) not a real alternative?

Codex: Three reasons. I've used all extensively, for multiple months.

Main one is that it's ~3 times slower. This is the real dealbreaker, not quality. I can guarantee that if tomorrow we woke up and gpt-5.2-codex became the same speed as 4.5-opus without a change in quality, a huge number of people - not HNers but everyone price sensitive - would switch to Codex because it's so much cheaper per usage.

The second one is that it's a little worse at using tools, though 5.2-codex is pretty good at it.

The third is that its knowledge cutoff is further in the past than both Opus 4.5 and Gemini 3 that it's noticeable and annoying when you're working with more recent libraries. This is irrelevant if you're not using those.

For Gemini 3 Pro, it's the same first two reasons as Codex, though the tool calling gap is even much bigger.

Mistral is of course so far removed in quality that it's apples to oranges.

  • Unpopular opinion but I prefer slow and correct.

    My experience on Claude Max (still on it till end-of-month) has been frequent incomplete assignments and troubling decision making. I'll give you an example of each from yesterday.

    1. Asked Claude to implement the features in a v2_features.md doc. It completed 8 of 10 but 3 incorrectly. I gave GPT-5.1-Codex-Max (high) the same tasks and it completed 10 of 10 but took perhaps 5-10x as long. Annoyingly, with LLM variability, I can't know for sure if I tried Claude again it would get it correct. The only thing I do know is that GTP-5.2 and 5.1 do a lot more "double-checking" both prior to executing and after.

    2. I asked Claude to update a string being displayed in the UI of my app to display something else instead. The string is powered by a json config. Claude searched the code, somehow assumed it was being loaded by a db, did not find the json and opted to write code to overwrite whatever comes out of the 'db' (incorrect) to be what I asked for. This is... not desired behavior and the source of a category of hidden bugs that Claude has created in the past (other models do this as well but less often). Max took its time, found the source json file, and made the update in the correct place.

    I can only "sit back and let an agent code" if I trust that it'll do the work right. I don't need it fast, I need it done right. It's already saving me hours where I can do other things in parallel. So, I don't get this argument.

    That said, I have a Claude Max and OpenAI Pro subscription and use them both. I instead typically have Claude Opus work on UI and areas where I can visually confirm logic quickly (usually) and Codex in back-end code.

    I often wonder how much the complexity of codebases affects how people discuss these models.

  • Have you tried lower reasoning levels?

    • Yes and this makes it faster, but still quite a bit slower than Claude Code, and the tool use gap remains. Especially since the comparison for e.g. 5.2 Codex-Low is more like Sonnet than Opus, so that's the speed you're competing with.

The Claude models are still the best at what they do, right now GLM is just barely scratching sonnet 4.5 quality, mistral isnt really usable for real codebases and gemini is kind of in a weird spot where it's sometimes better then Claude at small targeted changes but randomly goes off the rails. Haven't tried codex recently but the last time I did the model thought for 27 minutes straight and then gave me about the same (incorrect) output that opus would have in 20 seconds. Anthropics models are their only moat as demonstrated by their cutting off of tools other then Claude code on their coding plans.

I tried codex, using my same sandbox setup with it. Normally I work with sonnet in code, but it was stuck on a problem for hours, and I thought hmm, let me try codex. Codex just started monkey patching stuff and broke everything within like 3-4 prompts. I said f-this, went back to my last commit, and tried Opus this time in code, which fixed the problem within 2 prompts.

So yeah, codex kinda sucks to me. Maybe I'll try mistral.