Comment by cmrdporcupine

3 days ago

Coding, via something like Claude or Codex, will likely always be something best done by hosted cloud models simply because the bar there can always be higher. But it's already entirely possible to run local models for chat and research and basic document creation that can compete perfectly fine with the cloud models from 6 months to a year ago. The limitation at this point is just the cost of RAM.

This week's released of the new smaller Qwen 3.5 models was interesting. I ran a 4-bit quant of the 122b model on my NVIDIA Spark, and it's... pretty damn smart. The smaller models can be run at 8-bits on machines at very reasonable speeds. And they're not stupid. They're smarter than "ChatGPT" was a year or so ago.

AMD Strix Halo machines with 128GB of RAM can already be bought off the shelf for not-insane prices that can run these just fine. Same with M-series Macs.

Once the supply shocks make their way through the system I could see a scenario where it's possible that every consumer Mac or Windows install just comes with a 30B param or even higher model onboard that is smart enough for basic conversation and assistance, and is equipped with good tool use skills.

I just don't see a moat for OpenAI or Anthropic beyond specialized applications (like software development, CAD, etc). For long-tail consumer things? I don't see it.

1 comment

cmrdporcupine

daxfohl 3 days ago

Even for coding. I mean, there's what, maybe a few thousand common useful technologies, algorithms, and design patterns? A million uncommon ones? I think all that could fit in a local model at some point.

Especially if, for example, Amazon ever develops an AWS-specific model that only needs to know AWS tech and maybe even picks a single language to support, or maybe a different model for each language, etc. Maybe that could end up being tiny and super fast.

I mean, most of what we do is simple CRUD wrappers. Sometimes I think humans in the loop cause more problems than we solve, overindexing on clever abstractions that end up mismatching the next feature, painting ourselves into fragile designs they can't fix due to backward compatibility, using dozens of unnecessary AWS features just for the buzz, etc. Sometimes a single monolith with a few long functions with a million branches is really all you need.

Or, if there's ever a model architecture that allows some kind of plugin functionality (like LoRA but more composable; like Skills but better), that'd immediately take over. You get a generic coding skeleton LLM and add the plugins for whatever tech you have in your stack. I'm still holding out for that as the end game.