Comment by PedroBatista

9 hours ago

Great to know, but what was the cost both in terms of $$ and tokens used?

Not to invalidate these benchmark results because they are useful, but the real usefulness it what they are capable to do when real people interact with them at scale.

Regardless, these are good news, because now that Microsoft is basically giving up their all-in strategy with Github's Copilot and Anthropic is playing the "I'm too good for you" game, it's about time for them to get pressed into not making this AI world into a divide between the haves and the have-nots.

2 comments

PedroBatista

keyle 8 hours ago

Re pricing. Never as high as frontier commercial models.

CryptoBanker 7 hours ago

You’d be surprised with some long running complex tasks. I’ve seen Kimi spend 8 minutes (total) thinking on a task that Claude got done in 30 seconds. They both ultimately got it right, but Kimi spent ~$2.25 to Claude’s ~$0.20