← Back to context

Comment by dofm

1 day ago

The maths there is pretty undeniable, but it is not where I'd make the split. Having a machine that can run some modest local LLMs, like the Gemma 4 12B, is really worth it.

I don't know how much serious hands-free agentic coding I will ever do on my MacBook alone, but I do know that I would not have got so far into understanding this without tinkering with local models, llama.cpp, LM Studio, and LM Studio and all that.

I totally struggled to find the right frame of mind to explore any of this stuff without feeling defeated and bamboozled. Because it's just huge, exhausting, jargon-drenched, unknowable, and I am over the hill at fifty-plus.

Until, that is, I could poke around with setting it up on my own (secondhand) machine, watching the API calls, understanding some of the terminology. I didn't even buy the machine for that; it's just adequate to the task.

The Neo is too small to really get much benefit from this opportunity to make it more visceral and knowable.

> Having a machine that can run some modest local LLMs, like the Gemma 4 12B, is really worth it.

Cloud models are (much) faster, they don't consume so much power/generate heat, they have much bigger (LLM) context, they're much more precise and they have a much wider (engineering) context of the given problem.

Except privacy and use cases that are blocked by cloud models (e.g. reverse engineering), local LLMs are currently an expensive toy.

When I try to program with a local LLM (I'm on a 32/128 GB system), I end up wasting time compared to a cloud LLM.

  • Again, I would not argue against any of this.

    And I can't say that I won't switch to openrouter (even just for the same models) at some point.

    But one of the things I have found about my own process learning is that some lessons only come to you when you make yourself available to them. And if that means doing things the difficult way, that is what you should do.

  • > currently

    The interesting question is whether that gap will narrow, and if so, how much, and on what timescale.

    The exact answer to this question is not knowable, but if you are the kind of person who comes to a site called "hacker news", and you think there is a nonzero chance that the answer is that yes, the gap will narrow and this won't always be an expensive toy, then now seems like a pretty great time to get in the game and start exploring the capabilities.

  • Same. Local LLMs are fun to experiment with, but when I want generated code of a sufficient quality, I use a cloud LLM.

  • I agree completely. I think local AI is best limited to purpose built SLMs; all this craze around running quantized coding LLMs has taken the attention off SLMs.

  • > Cloud models […] don't consume so much power/generate heat

    I do realize the cloud is just someone else’s computer right? Power goes in, tokens and heat come out - just in another place

    • The cloud computers produce more tokens per watt. That said, if you have a computer at home running 24/7 for other reasons and you also can use it for some LLM work, why not.

  • Anything done local will likely come at higher cost and at scale with less energy efficiency and commodity, with less possibility to fine tune engineer deeply on wider horizon of issues.

    That's never the point of keeping local alternatives though.

    • Right.

      For me this dates all the way back to installing Slackware 1.0 (0.99pl12!) on an offline 486SX rather than just using the internet-connected workstations in the lab.

      Here, I already had a Mac that was powerful enough to run a local LLM, so now I do, because I can.

Exactly. The distinction between the various layers in "AI" systems is pretty vague to the newcomer. What is the "model" vs. the engine "running" it vs. weights?

I don't recall any previous tech stack that was barfed onto the scene with so little background or reference material, going from zero to endless undefined jargon... and no primer in sight.

For people who demand an understanding of their tools, it's a lot of work. I recognize the value of "AI" in performing the tasks I'd have to do manually; for example, keeping the data structures of my front- and back-ends in sync in a project. But do I want to interrupt my development and take weeks off to digest all of these tools?

And if I do, I want to run the show and fully understand it. And like you, I think that's best done locally.

  • The most unexpected thing for me was kind of philosophical in a ‘holy shit’ way.

    Cloud models still feel ‘magic’, like you send a request off and get something back, like it’s something ‘special’. I used to joke that ChatGPT might be some kind of mechanical turk underneath.

    Watching a model run local on your own machine hits different — you realise that yes, it IS just a computer program. Which for me actually makes me appreciate the leap we’ve made MORE, not less. From an information-theoretic point of view, LLMs really are something special.

    The fact that they are just programs, that I’ve now experienced first-hand that they’re just programs, makes all those questions around consciousness and intelligence much more interesting.

    • Yep — it hasn't changed how I feel about what LLMs are capable of (and very much not capable of) but this visceral feeling is fascinating.

      Like, just watching a computer I already owned act like ChatGPT with the wifi disconnected.

      It was the first time I stopped feeling quite so helpless, somehow.

    • Yeah, it's been fun for me running models (mostly Qwen 3.6 27B) on my 48GB M4 MacBook Pro. When i'm using it to run models, it's basically unusable for anything else - I actually do the work on my Macbook Neo. Took me a while to figure out why the models couldn't figure out how to make tool calls - because LMStudio by default uses a 32K input window, which is smaller than OpenCode's prompt, so half of the instructions were being pruned from the middle!

      4 replies →

  • For the most part you can just download LM Studio and go from there. It provides a chat interface and an easy-to-use interface to browse, load and use LLM models. The engine: it is abstracted away by LM Studio, if you want to dig deep it's llama.cpp as the runtime. Weights are the files what you download, they are the models for practical purposes.

    • I definitely would recommend LM Studio as a learning environment, because it surfaces a bunch of things in relatively clear-minded ways. I am very grateful for it.

Thanks for posting this. This is the tinkerer mentality. It is not for everyone, but certain things can only be learned in that way. It is the best antidote to AI paranoia. There is much that does not transfer between frontier models and local ones. There is that. But you can not tinker as much as you can with the former.

I agree with the learning aspect, but I have another motivation. I suspect that closed models might become too expensive to run for personal hobbyist use. I’ve been planning to buy a 64GB machine just to allow the limited local models this enables.

It's also great to have capability to run local models for more brute force tasks. Because you can change the system prompt, you can get local LLMs to do all kinds of high volume tasks without burning through tokens on a hosted model.

Just one example, I needed a bunch of images tagged and organised, with a local vision capable model I could pretty easily set that up and leave it running overnight.

I already had the GPU and memory for gaming, so it was at no cost for me to start running local models. But I feel the long term writing is on the wall, local models will only make more and more sense as they get better and more efficient.

> The maths there is pretty undeniable, but it is not where I'd make the split. Having a machine that can run some modest local LLMs, like the Gemma 4 12B, is really worth it.

Seems like a GPU with 12GB+ VRAM is going to be a much more affordable way to achieve that? Even a B580 should get reasonable perf there.

  • No idea. I am a Mac guy, have been for a very long time. I buy them secondhand as a rule.

    I guess I would build a powerful home LLM server if I was convinced I really needed one for my purposes for some agentic application or other. At the moment I'd prefer to ride this out with a machine that is also an excellent Mac.

> Having a machine that can run some modest local LLMs, like the Gemma 4 12B, is really worth it.

Agree having a powerful machine is really worth it in general for professionals, but strong disagree that running local LLMs has anything to do with it. It's hard enough as it is getting a good ROI on your time/money prompting/wrangling with frontier models. IMO leaning on the comparatively limited capabilities of local LLMs is best avoided in favor of keeping your own personal coding skills fresh and continuing to learn new ones.

  • I'm not that bothered about my coding skills, which are fine, and pretty up-to-date considering I'm now an old bloke. I am bothered about building an instinctive understanding that helps me deal with my anxieties and decide whether I want to carry on with this working life or quit.

    I needed to do this, this way, in my own time, to put my brain back together. It has worked for me, which is why I recommend it.

    YMMV.

    • Unfortunately the local llm bunch is not the most emphatetic one in my experience: you are somehow "expected" to immediately know all this stuff and god forbid you ask the wrong question. I've never seen or felt this level of bullying and weird vibes over tools and LLM models. "My setup works for you or beat it".

      2 replies →

  • Continuing to learn new ones, like what?

    To me, "how do contemporary AI systems work and interact with contemporary hardware and how can I best take advantage of their capabilities?" is the set of skills that are worth learning at this moment.

    What else is there? New / additional programming languages? New / additional database systems? frameworks? orchestrators? cloud provider / infra tooling? architectural patterns?

    I dunno, all of this seems really boring and "been there done that" to me at this moment in time!

    • Yes, that all tracks, and all of those skills are worth maintaining and improving. Great to tinker with LLMs locally hands-on to learn, and having a powerful enough machine to enable that to a reasonable degree is just one of many reasons why it's worth it. I'm just saying that IMO "how can I best take advantage" lands firmly in the bucket of only cloud-hosted frontier models being worth my time. I would speculate that holds true for a large portion of the wider HN audience but YMMV of course.

      1 reply →

I'd say give it some time for the dust to settle. This field badly needs standardized benchmarks even before the conversation around model goodness can start.

I just got Claude to download and install all the models and servers and agents and prepare all the launch scripts for me... no need to learn, just ask it to do it for you

  • Right, but I am a middle-aged bloke who is experiencing existential angst about whether I can carry on in this industry.

    I have a pretty deep, maybe paranoid need to be confident I have an intrinsic understanding, and I have found in my life that lessons come to you when you make yourself open to learning.

    So I need to build on top of what I know, taking as much of the hard way as I can bear to take at any one time — it has to be not quite difficult enough to put me off.

    I can't really explain what I have learned this way that is different, but I feel it in a way that I wouldn't if I'd simply pushed a button.

    For the same reason, I have a really basic 3D printer that I've set up myself, set up Klipper, configured how I want it, learned how to calibrate, all that. And now I can say that I feel I have an understanding of 3D printing. I could hold my head above water in a discussion with a real expert, maybe find work in an adjacent field where my insights would keep me grounded.

    I can afford a really good printer that has all that set up, and more, has no problems. But I'd just be someone who has a 3D printer.

    (Also who am I kidding about the existence of a printer with no problems)

    • This really resonates with me, and I'm only a decade and change into my career. I use claude a lot day to day. I try to use it sensibly, making me more productive and produce better work. I'm also trying not to lose understanding along the way. I want to be able to actually talk to the conclusions I'm reaching.

      I have colleagues that seem perfectly content to delegate too much to the agents, and it saddens me. It feels like there will be swaths of engineers that didn't train some of the critical thinking skills that I take for granted.

      I certainly see it in slack discourse around anything more complicated than a feature implementation. Maybe I'm just cynical. Time will tell, I suppose.

      4 replies →

    • Yeah, this is just the engineer's mindset. It's not surprising that this is a popular view here, even if it is not (and does not need to be) the mainstream perspective.

      2 replies →

  • I don't necessarily think your answer is wrong for all people, but if you work in software... how do you plan to differentiate yourself from everyone else out there, if the depth of your understanding is "Claude can do it for me"?

    • This ultimately is the discussion I am here for.

      I mean one of the things I use a local LLM for, because I can, is to generate starter documentation. But I ask it to — I want it to give me overviews, plans, all that. It can make something bespoke for me.

      I guess I could also ask it to do the work. But where do you draw the line?

      The universal labour-saving device is the great provocation of the next 100 years I think, and both Star Trek and Wall-E have grappled with it.

  • >no need to learn, just ask it to do it for you

    And that's how skills die.

    • And why is this skill important, if a machine can do it ? What's the last time you ploughed your field with oxen ?

    • Except with AI models it's possible to make a backup of them creating a permanent artifact of a skill.

    • When's the last time you shoed a horse?

      The reason I delegate so much of local LLM installation and administration to Claude Code is simply because there's no point learning practical things that will work completely differently in a couple of years, or in memorizing procedures that I'll forget long before I need to perform them again.

      No longer having to sweat all the details is a Good Thing, not a Bad Thing.

      11 replies →

  • Then what is the point of ddalex?

    • I think if you really don't feel the need to know the "why" of everything, sometimes this might be the right approach. It is quick, pragmatic, gets you started.

      Maybe my biggest problem with the world of agentic AI, and the reason I am putting myself through learning it the way I am, is that the need to know the "why" of everything is so fundamental to me, that I don't know if there is any point to me without it.

      So this is really the only way I know how to proceed.

      3 replies →

> I totally struggled to find the right frame of mind to explore any of this stuff without feeling defeated and bamboozled.

I found LM studio to be a nice starting point. Frindlier and more featureful than Ollama and not as intimidating as llama.cpp (though you will want to use that eventually)

  • LM Studio is also nice because of the way the interface explains things; parameters have explanations and hints. It has been designed by people who really care about making it understandable.

    I tried Ollama but I've settled on Unsloth Studio generally; once things really settle down I'll just run the llama-server UI, which is pretty nice.

    A friend is tinkering with LLMs for amusement on a 16GB Raspberry Pi 5, and when I explained that llama.cpp now had a typical web chat interface he was so happy — it's amazing what the "table stakes" are now.

Honestly your best bet is to buy a $20 Claude subscription, ask Claude to set it all up with Pi and llama.cpp and come back in 20 minutes after a cup of coffee. This is also a good idea because it will help set expectations of what a local model can do vs. a frontier model.

  • This is what I did after struggling to get llama.cpp working at a decent speed on my M1 Macbook. The secret is to very specific with your needs and targeted in what you are using llama.cpp for. Mine setup is just about strictly for qwen3-coder and now, I get a fairly decent speed out of it. I also installed Cursor to check Claude and it all worked out well.

    • Are you talking about Qwen3 Coder 30b a3b Instruct from August 2025, which is a non-reasoning model? Or the more recent "Qwen3 Coder Next" from Feb this year with 80b params, 3b active? I found Qwen3 coder next to be quite good on openrouter [1], but couldn't run it locally.

      [1] https://openrouter.ai/qwen/qwen3-coder-next

    • I don't know why we're even talking about Qwen3.6 for writing code when qwen3-coder exists. My experience is there's no contest. I'm using 30b with 96k context on a dedicated server.

      1 reply →

I've setup to local paradigms for local coding:

- opencode with it's webui

- deer-flow with it's research/powered front end

They both run websites so you don't have to baby sit them (eg, keep your mac open). I've build a pdf compressor over a few days by first having deer flow try and research the frameworks and pipeline. It stalls out because its not really a fluid programmer. Once it stalls out, I transferred it (manually for now) to opencode and it's refactoring it because it's just a collective bundle of sticks and it needs a lot of testing to tweak out the limited scop context. LLMs can't really hold large scopes (locally anyway, from what I've read from HN, it's possible with longer context).

It'll complete in a few days with maybe 3-4 hours of full attention interaction, but it's running 3x that without my attention. Obviously, if I paid more attention it'd run quicker, but since it's local, it's not pumping out large volumes of code, it's mostly looping over tests and capabilities as observed.

It's running Qwen3.6 35B MoE on a AMD 128GB strix halo. If I switched to the dense models, perhaps it'd be smarter, but the trade off seems to be much slower gen.

  • > - opencode with it's webui

    Have you tried Paseo?

    I have opencode in a VM, and the paseo daemon running in the VM, and then the Paseo Mac app. Really nice.

    (You can also use the Opencode GUI to frame a remote opencode web interface)

    • You can also just add OpenCode web as a PWA, if that's what you mean by "frame".

      I'm gonna check out paseo, but am not looking forward to all the ram the agent needs + all the ram paseo needs

      1 reply →

> I totally struggled to find the right frame of mind to explore any of this stuff without feeling defeated and bamboozled. Because it's just huge, exhausting, jargon-drenched, unknowable, and I am over the hill at fifty-plus.

Hello, my brother, just know that you have a fellow passenger in life at the same age who thinks the same thing. I agree that the local stuff is helping my understanding a LOT.

However, my gut feel as someone who got to experience the TeleBomb after the DotBomb is that the obfuscation is INTENTIONAL--it's neither you nor your age. I remember asking people to explain to me what the OC-768 startup endgame was when roughly 10 OC-768 links could carry the world's traffic at the time--and everybody giving me blank looks. The AI Bubble has the EXACT same feel as the Telecom Bubble--just bigger.

What I really wish is that I could find a VPS-type provider where I could toss things into their NVIDIA/AMD machines for an hour or two. Alas, all of the providers seem to want massive paperwork and huge minimum purchases.

I can't wait for the bubble to pop so that we mere mortals can finally build with this stuff.