← Back to context

Comment by logicprog

3 months ago

This is why I like Kimi K2/Thinking. IME it pushes back really, really hard on any kind of non obvious belief or statement, and it doesn't give up after a few turns — it just keeps going, iterating and refining and restating its points if you change your mind or taken on its criticisms. It's great for having a dialectic around something you've written, although somewhat unsatisfying because it'll never agree with you, but that's fine, because it isn't a person, even if my social monkey brain feels like it is and wants it to agree with me sometimes. Someone even ran a quick and dirty analysis of which models are better or worse at pushing back on the user and Kimi came out on top:

https://www.lesswrong.com/posts/iGF7YcnQkEbwvYLPA/ai-induced...

See also the sycophancy score of Kimi K2 on Spiral-Bench: https://eqbench.com/spiral-bench.html (expand details, sort by inverse sycophancy).

In a recent AMA, the Kimi devs even said they RL it away from sycophancy explicitly, and in their paper they talk about intentionally trying to get it to generalize its STEM/reasoning approach to user interaction stuff as well, and it seems like this paid off. This is the least sycophantic model I've ever used.

Which agent do you use it with?

  • I use K2 non thinking in OpenCode for coding typically, and I still haven't found a satisfactory chat interface yet so I use K2 Thinking in the default synthetic.new (my AI subscription) chat UI, which is pretty barebones. I'm gonna start trying K2T in OpenCode as well, but I'm actually not a huge fan of thinking models as coding agents — I prefer faster feedback.

    • I'm also a synthetic.new user, as a backup (and larger contexts) for my Cerebras Coder subscription (zai-glm-4.6). I've been using the free Chatbox client [1] for like ~6 months and it works really well as a daily driver. I've tested the Romanian football player question with 3 different models (K2 Instruct, Deepseek Terminus, GLM 4.6) just now and they all went straight to my Brave MCP tool to query and replied all correctly the same answer.

      The issue with OP and GPT-5.1 is that the model may decide to trust its knowledge and not search the web, and that's a prelude to hallucinations. Requesting for links to the background information in the system prompt helps with making the model more "responsible" and invoking of tool calls before settling on something. You can also start your prompt with "search for what Romanian player..."

      Here's my chatbox system prompt

              You are a helpful assistant be concise and to the point, you are writing for smart pragmatic people, stop and ask if you need more info. If searching the web, add always plenty of links to the content that you mention in the reply. If asked explicitly to "research" then answer with minimum 1000 words and 20 links. Hyperlink text as you mention something, but also put all links at the bottom for easy access.
      

      1. https://chatboxai.app

      1 reply →

  • I don't use it much, but I tried it out with okara.ai and loved their interface. No other connection to the company