Comment by WinstonSmith84

1 day ago

Not discussing Mythos here, but Opus. Opus to me has been significantly better at SWE than GPT or Gemini - that gets me confused why Opus is ranking clearly lower than GPT, and even lower than Gemini.

8 comments

WinstonSmith84

muyuu 21 hours ago

When did you last compare them? Codex right now is considerably better in my experience. Can't speak for Gemini.

gck1 21 hours ago
Tried Gemini 2 weeks ago to see where it's at, with gemini-cli.
Failed to use tools, failed to follow instructions, and then went into deranged loop mode.
Essentially, it's where it was 1.5 years ago when I tried it the last time.
It's honestly unbelievable how Google managed to fail so miserably at this.
- 4b11b4 20 hours ago
  
  Their harness might be behind
  
  1 reply →
- unsupp0rted 11 hours ago
  
  It’s great on AI Studio. Harness issues, I agree.
sandos 14 hours ago

Agree, I never actually had great success with Opus. I think its the failures that are annoying, its probably better than codex when its "good", but it fails in annoying ways that I think codex very seldom does.
StingyJelly 11 hours ago

I wouldn't call codex considerably better. It may depend on specific codebase and your expectations, but codex produces more "abstraction for the sake of abstraction" even on simple tasks, while opus in my experience usually chooses right level of abstraction for given task.

otabdeveloper4 11 hours ago

A secret art known to the cognoscenti as "benchmark gaming".