Comment by hu3

2 months ago

Open weight models are neat.

But for SOTA performance you need specialized hardware. Even for Open Weight models.

40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

Your link starts with:

> "Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago."

I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

4 comments

hu3

Lapel2742 2 months ago

> I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

I don't know about the capabilities of a 5090 but you probably can run a Devstral-2 [1] model locally on a Mac with good performance. Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2]. My impression is that local models have made huge progress.

Coding aside I'm also impressed by the Ministral models (3b, 8b and 14b) Mistral AI released a a couple of weeks ago. The Granite 4.0 models by IBM also seem capable in this context.

[1] https://mistral.ai/news/devstral-2-vibe-cli

[2] https://www.anthropic.com/news/swe-bench-sonnet

cmrdporcupine 2 months ago

Thing is you can pay basically fractions of cents a query to e.g. DeepSeek Platform or DeepInfra or Z.Ai or whatever and have them run the same open models for far cheaper and faster than you could ever build out at home.
It's neat to play with, but not practical.
The only story that I can see that makes sense for running at home is if you're going to fine tune a model by taking an open weight model and <hand waving> doing things to it and running that. Even then I believe there's places (hugging face?) that will host and run your updated model for cheaper than you could run it yourself.
Aurornis 2 months ago

> Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2].
I've played with Devstral 2 a lot since it came out. I've seen the benchmarks. I just don't believe it's actually better for coding.
It's amazing that it can do some light coding locally. I think it's great that we have that. But if I had to choose between a 2024-era model and Devstral 2 I'd pick the older Sonnet or GPTs any day.

menaerus 2 months ago

> 40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

For general purpose LLM probably yes. For something very domain-specialized not necessarily.