← Back to context

Comment by Workaccount2

3 days ago

I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Somewhat comically, the author seems to have made it about 2 days. Out of 1,825. I think the real story is the folly of fixating your eyes on shiny new hardware and searching for justifications. I'm too ashamed to admit how many times I've done that dance...

Local models are purely for fun, hobby, and extreme privacy paranoia. If you really want privacy beyond a ToS guarantee, just lease a server (I know they can still be spying on that, but it's a threshold.)

I agree with everything you said, and yet I cannot help but respect a person who wants to do it himself. It reminds me of the hacker culture of the 80s and 90s.

  • Agreed, Everyone seems to shun the DIY hacker now a days; saying things like “I’ll just pay for it”. It’s not about just NOT paying for it but doing it yourself and learning how to do it so that you can pass the knowledge on and someone else can do it.

    • And, it's not just about "pay for it" vs. "don't pay for it". It's about needing to pay for it monthly or it goes away. I hate subscriptions. They sneak their way into your life, little by little. $4.99/mo here. $9.99/mo there. $24.99/yr elsewhere. And then at some point, in a moment of clarity, you wake up and look at your monthly expenses and notice you're paying a fortune just to exist in your life as you are existing.

      I'm not going to pay monthly for X service when similar Y thing can be purchased once (or ideally open source downloaded), self-hosted, and it's your setup forever.

      1 reply →

My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I can run models locally that are arguably "better" than what was considered SOTA about 1.5 years ago. This is of course not an exact comparison but it's close enough to give some perspective.

  • OpenAI released GPT-4o in May 2024, and Anthropic released Claude 3.5 Sonnet in June 2024.

    I haven't tried the local models as much but I'd find it difficult to believe that they would outperform the 2024 models from OpenAI or Anthropic.

    The only major algorithmic shift was done towards the RLVR and I believe it was already being applied during the 2023-2024.

  • I don't know about that. Even trying Devstral 2 locally feels less competent than the SOTA models from mid-2024.

    It's impressive to see what I can run locally, but they're just not at the level of anything from the GPT-4 era in my experience.

Is that really the case? This summer there was "Frontier AI performance becomes accessible on consumer hardware within a year" [1] which makes me think it's a mistake to discount the open weights models.

[1] https://epoch.ai/data-insights/consumer-gpu-model-gap

  • Open weight models are neat.

    But for SOTA performance you need specialized hardware. Even for Open Weight models.

    40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

    Your link starts with:

    > "Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago."

    I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

    • > I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

      I don't know about the capabilities of a 5090 but you probably can run a Devstral-2 [1] model locally on a Mac with good performance. Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2]. My impression is that local models have made huge progress.

      Coding aside I'm also impressed by the Ministral models (3b, 8b and 14b) Mistral AI released a a couple of weeks ago. The Granite 4.0 models by IBM also seem capable in this context.

      [1] https://mistral.ai/news/devstral-2-vibe-cli

      [2] https://www.anthropic.com/news/swe-bench-sonnet

      2 replies →

    • > 40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

      For general purpose LLM probably yes. For something very domain-specialized not necessarily.

  • With RAM prices spiking, there's no way consumers are going to have access to frontier quality models on local hardware any time soon, simply because they won't fit.

    That's not the same as discounting the open weight models though. I use DeepSeek 3.2 heavily, and was impressed by the Devstral launch recently. (I tried Kimi K2 and was less impressed). I don't use them for coding so much as for other purposes... but the key thing about them is that they're cheap on API providers. I put $15 into my deepseek platform account two months ago, use it all the time, and still have $8 left.

    I think the open weight models are 8 months behind the frontier models, and that's awesome. Especially when you consider you can fine tune them for a given problem domain...

> I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Well, the hardware remains the same but local models get better and more efficient, so I don't think there is much difference between paying 5k for online models over 5 years vs getting a laptop (and well, you'll need a laptop anyway, so why not just get a good enough one to run local models in the first place?).

  • Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

    Even still, right now is when the first gen of pure LLM focused design chipsets are getting into data centers.

    • > Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

      Unless you're YOLOing it, you can review only at a certain speed, and for a certain number of hours a day.

      The only tokens/s you need is one that can keep you busy, and I expect that even a slow 5token/sec model utilised 60s in every minute, 60m of every hour and 24 hours of every day is way more than you can review in a single working day.

      The goal we should be moving towards is longer-running tasks, not quicker responses, because if I can schedule 30 tasks to my local LLm before bed, then wake up in the morning and schedule a different 30, and only then start reviewing, then I will spend the whole day just reviewing while the LLM is generating code for tomorrow's review. And for this workflow a local model running 5 tokens/s is sufficient.

      If you're working serially, i.e. ask the LLM to do something, then review what it gave you, then ask it to do the next thing, then sure, you need as many tokens per second as possible.

      Personally, I want to move to long-running tasks and not have to babysit the thing all day, checking in at 5m intervals.

    • At a certain point, tokens per second stop mattering because the time to review stays constant. Whether it shits out 200 tokens a second versus 20, it doesn't much matter if you need to review the code that does come out.

  • If you have inference running on this new 128GB RAM Mac, wouldn't you still need another separate machine to do the manual work (like running IDE, browsers, toolchains, builders/bundlers etc.)? I can not imagine you will have any meaningful RAM available after LLM models are running.

    • No? First of all you can limit how much of the unified RAM goes into VRAM, and second, many applications don't need that much RAM. Even if you put 108 GB to VRAM and 16 to applications, you'll be fine.

      2 replies →

I completely agree. I can't even imagine using a local model when I can barely tolerate a model one tick behind SOTA for coding.

That's the kind of attitude that removes power from the end user. If everything becomes SAAS you don't control anything anymore.

> Local models are purely for fun, hobby, and extreme privacy paranoia

I always find it funny when the same people who were adamant that GPT-4 was game-changer level of intelligence are now dismissing local models that are both way more competent and much faster than GPT-4 was.

  • Moon lander computers were also game changers. Does not mean I should be impressed by the compute of a 30 year old calcualator that is 100x more powerful/efficient in 2025 when we have stuff a few orders of magnitude better.

    For simple compute, its usefulness curve is a log scale. 10x faster may only be 2x more useful. For LLMs (and human intelligence) its more quadratic, if not inverse log (140IQ human can do maths that you cannot do with 2x 70IQ humans. And I know, IQ is not a good/real metric, but you get the point)

    • 30-years old calculators are still good enough for basic arithmetic and in fact even in 2025 people have one emulated on their phone that isn't more powerful than the original, and people still use them routinely.

      If Claude 3 Sonnet was good enough to be your daily driver last year, surely something that is as powerful is good enough to be your daily driver today. It's not like the amount of work you must do to get paid doubled over the past year or anything.

      Some people just feel the need to live always on the edge for no particular reason.

      1 reply →