Comment by mattjoyce

8 days ago

At the right price, these model don't need to be the best, good enough will do. I think we're fast approaching good enough for most users.

22 comments

mattjoyce

kouteiheika 8 days ago

This. Here's a quick experiment I did yesterday.

I got a new $20 Claude subscription to try the new Fable model. I gave it a single prompt, and it barely finished, using up my whole session quota (it was at ~95% when it finished) and 10% of my weekly quota.

For comparison, with the Kimi Code $40 subscription I can pretty much constantly run two/three agents in parallel for the whole week, and I never run out of quota. I can blindly throw it at anything and everything without worrying about hitting the limits. (And it's not exactly a cheap model to run -- it has 1 trillion parameters!)

Is Kimi as good as Claude? Of course not. But you don't need the absolute state-of-art for most things. If I don't have exceptionally difficult tasks it makes no sense to use it. Just throw Kimi at it, and even if it needs to run 2 or 3 times longer in the background I don't care, because I'm not running out of tokens there.

nl 8 days ago
A word of caution on this.
I've tried this too, and was disappointed.
Kimi generally benchmarks at "a bit more intelligent than Sonnet Medium" levels[1] and I'd agree broadly with this assessment.
If you have adapted your coding to rely on the agentic style that is doable in Opus 4.7+ then you will find Kimi disappointing.
If you are using it in a more targeted way then it can work well.
[1] https://artificialanalysis.ai/agents/coding-agents?agents=cl...
- kouteiheika 8 days ago
  
  Yes, I would agree with this.
  I think it works best when you're using the agent in a more hands-on way with a targeted prompt. If you're obsessive about code quality like I am (so you thoroughly review and, when needed, reprompt or even rewrite what the agent does) then you'll be fine, but if you like to just throw a prompt at the wall and expect it to plan and execute the whole thing perfectly then you'll be disappointed.
  A middle-ground trick one can use is to have Opus (or Fable now) plan the whole thing and get something cheaper like Kimi execute on it.
  
  4 replies →
- poly2it 8 days ago
  
  Is there any open model that can emulate the agentic experience you get with Opus 4.7?
  
  2 replies →
EagnaIonat 7 days ago

> This. Here's a quick experiment I did yesterday.
It's like running a sports car and then complaining it burns through petrol too fast.
The truth is the model while impressive is not needed for much of what people need.
Local models can do the work and just offload heavy lifting to the cloud models.

JKCalhoun 8 days ago

Not only that, it's easy to let ethics steer my choice as well. And at this point I suspect OpenAI will never earn my respect.

emodendroket 7 days ago

I find it is a quite reliable workflow to ask a strong model to design a plan and then point a weaker one at executing. The agent harnesses themselves are baking in similar concepts though.

panos_news 8 days ago

Yeah, that's how I feel too. I am totally fine with xHigh GPT 5.5 when it comes to coding.

opennash 8 days ago

agreed, unlimited gpt5.5 fast is sufficient for 90% of my use cases. Tried Fable, nice to have but we don't really need it.

boc 8 days ago

OTOH, using the best is a competitive advantage when time = money. It's like giving your engineers a slow laptop because it's cheaper. It may be cheaper but not worth the cost.

atraac 8 days ago

Unless your job is purely producing code pointlessly, this is not a really good comparison. Most of the time really is spent on understanding the problem and figuring out solutions, not waiting on CPU.
lelanthran 8 days ago
> OTOH, using the best is a competitive advantage when time = money. It's like giving your engineers a slow laptop because it's cheaper. It may be cheaper but not worth the cost.
That doesn't imply giving your devs the best laptop makes any difference.
How much more productive will your devs be if you upgrade them from a 32GB RAM, 8-core laptop to a 768GB RAM 96-core threadripper?
In your analogy, Kimi may not be the 4-core celeron with 4GB of RAM, it's more like the 8-core AMD with 32GB of RAM.
- knollimar 8 days ago
  
  768GB seems oddly specific for Kimi
bushbaba 8 days ago

Not necessarily, inference speed also has huge time aspect. For example anthropic takes nearly twice as long as OpenAI models for my tasks with both having similar success rates.

flowbarai 8 days ago

[flagged]