Comment by mustaphah

2 months ago

I speculate something similar (or even worse) is going on with Terminal-Bench [1].

Like, seriously, how come all these agents are beating Claude Code? In practice, they are shitty and not even close. Yes. I tried them.

[1] https://www.tbench.ai/leaderboard

Claude code was severely degraded the last few weeks, very simple terminal prompts were failing for me that it never had problems with.

  • Follow the money. Or how much comes from your pocket vs. VC and big tech speculators.

    • They did a big fundraising round right after so it's easy to suspect they were manipulating profitability growth for it.

They're all using claude so idk. Claude code is just a program, the magic is mainly in the model