Comment by WinstonSmith84
1 day ago
Not discussing Mythos here, but Opus. Opus to me has been significantly better at SWE than GPT or Gemini - that gets me confused why Opus is ranking clearly lower than GPT, and even lower than Gemini.
1 day ago
Not discussing Mythos here, but Opus. Opus to me has been significantly better at SWE than GPT or Gemini - that gets me confused why Opus is ranking clearly lower than GPT, and even lower than Gemini.
When did you last compare them? Codex right now is considerably better in my experience. Can't speak for Gemini.
Tried Gemini 2 weeks ago to see where it's at, with gemini-cli.
Failed to use tools, failed to follow instructions, and then went into deranged loop mode.
Essentially, it's where it was 1.5 years ago when I tried it the last time.
It's honestly unbelievable how Google managed to fail so miserably at this.
Their harness might be behind
1 reply →
It’s great on AI Studio. Harness issues, I agree.
Agree, I never actually had great success with Opus. I think its the failures that are annoying, its probably better than codex when its "good", but it fails in annoying ways that I think codex very seldom does.
I wouldn't call codex considerably better. It may depend on specific codebase and your expectations, but codex produces more "abstraction for the sake of abstraction" even on simple tasks, while opus in my experience usually chooses right level of abstraction for given task.
A secret art known to the cognoscenti as "benchmark gaming".