← Back to context

Comment by freediver

5 months ago

Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

Thank you to the Kagi team for such fast turn around on new LLMs being accessible via the Assistant! The value of Kagi Assistant has been a no-brainer for me.

  • [flagged]

    • I find that giving encouraging messages when you're grateful is a good thing for everyone involved. I want the devs to know that their work is appreciated.

      Not everything is a tactical operation to get more subscription purchases - sometimes people like the things they use and want to say thanks and let others know.

I'm surprised that Gemini 2.0 is first now. I remember that Google models were under performing on kagi benchmarks.

  • Having your own hardware to run LLMs will pay dividends. Despite getting off on the wrong foot, I still believe Google is best positioned to run away with the AI lead, solely because they are not beholden to Nvidia and not stuck with a 3rd party cloud provider. They are the only AI team that is top to bottom in-house.

How did you chose the 8192 token thinking budget? I've often seen Deepseek R1 use way more than that.

  • Arbitrary, and even with this budget it is already more verbose (and slower) overall than all the other thinking models - check tokens and latency in the table.

One thing I don't understand is why Claude 3.5 Haiku, a non thinking model in the non thinking section, says it has a 8192 thinking budget.

Do you think kagi is the right Eval tool? If so,why?

  • The right eval tool depends on your evaluation task. Kagi LLM benchmark focuses on using LLMS in the context of information retrieval (which is what Kagi does) which includes measuring reasoning and instruction following capabilities.