Comment by weitendorf

6 days ago

So far it's better for equivalent Qwen 3.5 workloads, and much less expensive. As you mention Qwen spends way too much time/tokens reasoning, so it ends up being more expensive than you'd think based on its model card (also IME, flaky).

I actually think this model is a Big Deal because there's a whole world out there of people building on top of Qwen and other Chinese models, and now Mistral has just released one of the best generalist FOSS models in its price/size range at an excellent price ($0.60/1M output is a steal). Mistral could potentially grab a lot of that.

Personally I am going to build off of it and invest in their ecosystem now, with this model, because it's definitely worth paying for at the current price. Whether Mistral or some other venture comes out with the next big thing in that category is anybody's guess, but I hope now that labs are starting to converge on more rapid release cycles, I'm hoping Mistral won't be far behind.

The main thing for me though is that for small model use cases, it just doesn't make sense to pay a ton for Haiku/Gemini and other expensive small models that you can't self-host or finetune or generally build upon. They cost too much and can't be tinkered with. Also, the range in which you'd want the incrementally better performance of something like Haiku over Mistral, but not enough to think about the benefits of tuning or self-hosting inference, are few for me. But at the same time, if you're going to invest in building on top of someone else's product, you want them to be trustworthy and long-term partners.