← Back to context

Comment by anonymid

13 hours ago

I guess the hope is that combining two sub-par coding models (xAI's grok + cursor's composer) and combining the data they have access to, they can build something that can compete with OpenAI / Anthropic in the coding space...

I guess I kinda see it... it makes sense from both points of view (xAI needs data + places to run their models, cursor needs to not be reliant on Anthropic/OpenAI).

I think I don't see it working out... I just don't see an Elon company sustaining a culture that leads to a high-quality AI lab, even with the data + compute.

Have to call out that comment about grok code being sub par. I used it exclusively when it was free in Cursor and have nothing bad to say about it. And that was months ago. I imagine it’s a lot better now.

Wasn’t composer trained on Kimi? Has anyone had a chance to compare the latest Kimi model to composer?

  • Composer-2 is based on Kimi K2.5, but with extensive RL. Cursor estimated 3x more compute on their RL than the original K2.5 training run (some details in https://cursor.com/blog/composer-2-technical-report).

    Composer-2 seems very useful in Cursor, while K2.6 according to AA seems to be a really useful general model: https://artificialanalysis.ai/articles/kimi-k2-6-the-new-lea...

    • I used to hate on Composer 2 but I'm coming around to it. Opus for the big stuff and multi-file operations, Composer for all the small day-to-day IDE tasks works pretty good for me.

  • I'm going to be brutally honest but I have not found Kimi to be useful at all. It simply cannot compete with what closed models from Codex and Claude offers. I don't want to risk using a model outside the ecosystem and introduce variables as most of my workflow is baked into two to three large company models.

    • That's interesting, Kimi K2.5 used through KimiCode was comparable to Sonnet in my tests, and is an excellent alternative to Anthropic models

      That being said, I noticed that Kimi being served through Openrouter providers was trash. Whatever they do on the backend to optimize for throughput really compromised the intelligence of the model. You have to work with Kimi directly if you want the best results, and that's also probably why they released a test suite to verify the intelligence of their new models.

    • On the other hand, I found MiniMax M2.7 a reasonable model that I could trust.

      I guess really depends on tastes