Comment by hu3

3 days ago

Open weight models are neat.

But for SOTA performance you need specialized hardware. Even for Open Weight models.

40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

Your link starts with:

> "Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago."

I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

> I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

I don't know about the capabilities of a 5090 but you probably can run a Devstral-2 [1] model locally on a Mac with good performance. Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2]. My impression is that local models have made huge progress.

Coding aside I'm also impressed by the Ministral models (3b, 8b and 14b) Mistral AI released a a couple of weeks ago. The Granite 4.0 models by IBM also seem capable in this context.

[1] https://mistral.ai/news/devstral-2-vibe-cli

[2] https://www.anthropic.com/news/swe-bench-sonnet

  • Thing is you can pay basically fractions of cents a query to e.g. DeepSeek Platform or DeepInfra or Z.Ai or whatever and have them run the same open models for far cheaper and faster than you could ever build out at home.

    It's neat to play with, but not practical.

    The only story that I can see that makes sense for running at home is if you're going to fine tune a model by taking an open weight model and <hand waving> doing things to it and running that. Even then I believe there's places (hugging face?) that will host and run your updated model for cheaper than you could run it yourself.

  • > Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2].

    I've played with Devstral 2 a lot since it came out. I've seen the benchmarks. I just don't believe it's actually better for coding.

    It's amazing that it can do some light coding locally. I think it's great that we have that. But if I had to choose between a 2024-era model and Devstral 2 I'd pick the older Sonnet or GPTs any day.

> 40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

For general purpose LLM probably yes. For something very domain-specialized not necessarily.