Comment by XCSme

6 hours ago

Gpt 5.5 is quite a big leap, it's a lot better than opus 4.7 for agentic coding

4 comments

XCSme

Arena only allows very small context sizes, so it's a noisy benchmark for what we care about IRL.

Better in what ways? I'm just curious about your experience.

XCSme 5 hours ago
Consistency, not making mistakes.
- mettamage 5 hours ago
  
  Ahh... that is indeed an issue I have with Claude. I'll check it out!