← Back to context

Comment by elteto

18 hours ago

Local models will never compete with large SOTA models, in the same way an iPhone doesn't compete with supercomputers doing nuclear simulations.

They paths will differentiate and split. Probably SOTA models will eventually be locked down and only accessible to state actors because of how expensive they will be to run (already started with Mythos).

> SOTA models will eventually be locked down

that might be true for us based providers but i dont see china turning closed source anytime soon.

a lot of chinese labs come from big non ai focused cloud services (alibaba, tencent, huawei) who want new models with higher benchmark scores and lower inference cost. they dont care if the competition gets better because its all open so they can build off each others tech, and if anything happens they got other profitable services to fall back on instead of depend on llms only like anthropic.

also the business culture is way different, in vc backed america you would get laughed out the room for saying "there is no moat we just do the same thing as everyone but better". you need to show infinite potential growth and lock everything down to prevent competition but you can get millions to start with no customers and no profits. in china its all about the real money they dont care if your margin is 10 or 90 percent as long as you stay profitable. the llm providers are profitable so they keep their business model.

You don't need a (one huge) model to do everything. You need specialized & smaller models that are very good at specific tasks. Collaborating among themselves.

The fact that we see stagnation in terms of billions of parameters shows that efficiency does not scale linearly with the model size. More of an S shaped chart. The middle was Claude 3.5. Since then, it is more about integrating and collaborating with different systems.

its a big assumption that larger models bring any measurable benefit in the long term. there's a point where its not worth paying the expense of a bigger model and we dont know where that will be as both, models and hardware improves.

we do know however where evolution is at right now with our brains, but thats probably not comparable - yet the only thing I can see to make any kind of prediction at all

Current local models already compete.

  • A Qwen3.6-35B-A3B or whatever it's full name is, when on a 3090, can at the very least, with very little fine tuning, compete with Haiku and blows away GPT4.1 (aka, the cheap models).

    It might keep up with Sonnet 4.5 with some tinkering.

    But long story short: it seems to have better performance and similar quality for a payoff of a year or so compared to cloud models. In the same way you can self host faster/easier/cheaper than cloud hosting, if you are okay with the negatives.

    I'm returning my 3090 soon for a R9700 after some more basic benchmarking, since the higher RAM should improve my observations more.

    • > It might keep up with Sonnet 4.5 with some tinkering.

      I would love to see that. I've been using Qwen3.6 35B and the dense 27B, and they are both too slow with not such great results for agentic coding tasks. It's ok, but not impressive. I had better luck with the BF16 and Q8 than the Q4 from unsloth (really love what unsloth is doing in this space). Another problem I had with Qwen, which I did not ever encounter with Sonnet - even the BF16 gets stuck and needs a "continue task" prompt from time to time, the lower quants are even worse in that regard.

      If you get some interesting results, I would love to read about it!

      1 reply →

You are missing the point. Parents says the market to win need economical models more than SOTA models. Whoever is running those nuclear simulations is not making as much as Apple.