← Back to context

Comment by cmiles8

4 hours ago

We’re not there yet, but the obvious endgame of the present bubble insanity is open models running on local hardware and devices are “good enough” for most use cases. That will completely implode what’s going on at the moment in tech.

Happened to me. CoPilot changing prices prompted me to cancel my CoPilot subscription and install a local coding model running entirely in VRAM. Will call Claude APIs when I get really stuck, but I should be able to handle 80% of my needs with a dumber local model.

For a long time, too. Programming languages rarely change much, techniques rarely change, so I should be able to use said model for I hope at least five years; and if at any time they optimize local models to cram even more intelligence into the same amount of VRAM, I can upgrade to that.

I like this path.

  • > Will call Claude APIs when I get really stuck, but I should be able to handle 80% of my needs with a dumber local model.

    I experiment with all of the local models I can fit into 32GB of VRAM and I have subscriptions to multiple SOTA providers.

    The difference between them is very large, unfortunately. The local models can handle small tasks and refactoring mostly okay, but doing anything challenging with them becomes a waste of time. Unfortunately the waste isn’t immediately obvious because they will come back with something that looks like it works, but then on closer examination I need to throw it out and reset them in a usable direction.

This. OpenAI and Anthropic are ultimately compute infrastructure plays and not really AI. Everyone will have models, they'll have the ability to run them. This is why the GPU shortage is in their favor.

  • And like Google and Meta, these companies are going to morph into advertising giants. Advertising is an economic black hole and it eats everything that comes close.

    • Embedding ads in LLM responses is something researchers are having a lot of trouble figuring out right now.

      I have seen the results of some early attempts. It fails in such hilarious ways that all these companies are scared of productizing it. But once someone does it, the taboo is broken and everyone else will follow suit immediately.

  • Maybe. But if we can all run our own model locally in 2 years on commodity hardware OpenAI and Anthropic will start to look like WeWork during the pandemic

    • I agree with you that they are headed in that direction! The GPU shortage is (I think) similar to the pandemic era hiring binge. It's less about the extra compute and more about denying the GPUs to potential competitors. They're racing against time to find something that gives them real moat (gen ai I guess?) and they are trading money for time.

      This is also why the money being poured into datacenters isn't going to result in as much development as you think. It's about leveraging other people's money to lockdown more future hardware. This is going to end exactly like fiber build out in the 2000s. Eventually that fiber got used but the folks who originally paid for it got hosed.

  • How does that view align with Anthropic leasing data centers from others?

    I don’t know OpenAI’s infra, but to the extent they are buying GPUs and building data centers with their own money, that sounds like a bad move.

    Satya has mismanaged the AI transition in many ways, but one thing he got right is that models are commodities, and the value is in applications that apply them to create user benefit. I agree that any company trying to build a moat with a model is not long for this world.

  • Do you think there will still be an incentive to release weights in that scenario? Everyone will have models only if there continue to be companies releasing weights.

    • Companies won't but I suspect this is a role that something else open source-y will fill that niche. Maybe orgs like wikimedia or internet archive, maybe some hackers just making things, maybe nation states that want to disrupt other players. Also model training will get better and better both on the algo and the hardware side. You can easily see a world where you might be able to train a good enough model on a home lab in a few days.

      2 replies →

If you are willing to spend about 2000 on GPUs, we are almost there.

In my opinion, the bottleneck is the package management layer and not the model capabilities and performance.

I have been an avid Linux user for decades, and if I find it confusing and painful, something is missing.

You just described the absolute nightmare scenario for the newly minted trillion-dollar companies whose only hope is for enterprises and SMB to move all their business processes to the cloud, with employees competing at token maxxing.

I wouldn't say "completely implode", too much money was poured int it, but it's clear we're heading in that direction. You get a model that is "good enough", plus privacy, plus savings in the long term.

Paradoxically, the better results we get from general harness of coding agents, the less moat Claude and co. get. It's unbelievably how fast some open models outpaced frontier models of just a few months ago.

this is sorta like saying that being able to run your blog on your laptop will completely implode the cloud business

  • This is actually what happens.

    I run my word processing software on my apple 2 (a total joke of a computer) instead of running it on the WANG.

    I run my book keeping software on visicalc instead of the IBM.

    I run my simulation software on my IBM PC (I even paid for the 8087!) instead of the VAX.

    Moore's law has, at least so far, allowed the pioneers with toy computers to grow their toys big enough to solve "big boy" problems after some time has allowed the toy computers to be faster and the pioneers have scaled their crappy home-grown solution to solve their 60% of the problem that was originally solved by some enormous complex system.

    Eventually the toy infrastructure gets expensive and solves 90-120% of the "big iron" problem space, but it also grows to cost as much as the big iron solution, but then a new generation of toy software and toy systems emerges to disrupt the "big iron" systems.

    See also http://www.catb.org/jargon/html/W/wheel-of-reincarnation.htm...

    • Under appreciated requirement for this to work in post-cloud times: open source

      If a vendor can SaaS a solution, then enterprise is generally happy (they don't want to have to hire folks for maintenance), and that completely locks out any ability to run locally.

      Between enterprise's ambivalence and the obvious financial incentive to vendors, you get SaaS-only products.

  • It's a huge difference. If you had AI sufficiently good running locally on a phone, you could devise workflows for things like basic digital hygiene, technical assistance, and tedious tasks like inbox management, image sorting, device updates, and so on. Privacy and security gets a big boost past some local competence threshold, and we're nearly there.

    Make the local AI competent enough to do good image generation and editing, realtime voice and music generation, handle agentic tasks with a framework like Hermes, and you can take your AI places to do tasks in contexts that are inaccessible to or inappropriate for cloud.

    Frontier big platform models will be the best, but there's a level of "good enough" for local uses that we're already seeing flourish, and "good enough" for the average joe is almost here.

    • Phones and laptops are terrible devices for local AI, way too constrained by bad thermals and small batteries. MiniPC's (many of them using mobile hardware) don't have that particular issue, and can easily run on a 24/7 basis.

      1 reply →

  • It's a little different because cloud and blogs didn't actively get in the way of your home compute. To wit, the various cost spikes for hardware.

    People -- WANT -- this technology on their home devices and (apparently?) the providers of this tech don't seem to be running a profit so they probably don't want the maintenance tail on their side either.

    I think it's a bit different. Inevitable that this becomes a household-run thing? Not likely.

  • The primary feature of a blog or any website is that it is available around the clock, that is the primary feature of cloud: around on the clock computer and network that scales on demand.

    The primary feature of "AI" is to process information and reason with a natural language interface at speed, the primary feature of AI bigboys is to provide the machinery that runs the "models".

    See the difference?

  • Running an LLM locally is theoretically viable. Running your blog on your laptop is never viable (unless you hook it up like a server). One just requires compute while the other a stable network.

    • tbh, my home network is pretty close to the stability of my host these days…

      But my downtimes are a bit self-inflicted: changing ISPs which I can personally workaround but harder for a blog where one expects uptime.

  • More like implode proprietary blog hosting platforms and replace them with commodity VMs that can be used for blog hosting, among other things

  • You have to consider that the enshittification factor is much higher now than in the cloud-for-free age.

I disagree. We are currently in a weird period where these frontier AI companies are losing tons of money even on the subscription-based AI models. It's just too compute intensive and there's no way most people are going to be buying the kind of hardware required to run $20 worth of inference every day.

Sadly - it's going to be ads. Advertising is going to get in there and enshittify the whole thing because as always, advertising income is too easy and too plentiful for any company to resist.

Right now the models are fairly agnostic, but we are a hair-breadth away from ChatGPT responding with, "the right tool for this job is a circular saw - something like the Milwaulkee M18, which happens to be on sale at Home Depot this weekend."

  • $20/day x 250 days per year x # devs/agents/etc = $$$. About $5k per dev at that daily use case.

    Enough to validate repurposing an existing workstation with enough RAM, or finding a used high VRAM GPU, or in my case buying a Strix Halo system for home lab and local models.

    The future is once again not cloud based, for AI tools.

  • Most people are running a whole lot less than $20's worth of tokens per day on cloud platforms. (Is that assuming a frontier model? 1M output tokens per day?) Local hardware could easily take up that workload, at least the part of it that's non-time-critical.

  • The advertising future looks like that to me, too. Service proxies like OpenRouter might talk about price optimization, maybe some ad filtering. But I expect proxies will have malicious entries, too, surreptitiously altering agentic prompts.

  • Ads are usually the workaround where you don’t deliver enough value to get people to subscribe or payments are unavailable for some reason.

    It makes sense to show some ads and get some money at low volume (like a faraway reader wanting to read a story in your local newspaper) but taking money from regular users directly will pay much more.

    Newspapers are happy to cannibalize 99% of their ad revenue with a paywall if that 1% subscribes because that’s how much more money you make from someone paying $10-$20/month vs ads.

    But yeah, if people use it as a buying recommendation engine, that’s where the money is on ads/referrals but a lot of AI use has little/no connection to buying intent touchpoints.

    • Newspapers had no choice after craigslist and later Google/Facebook took all their classified revenue.

      LLMs may or may not be able to cover their costs with it. We'll see - I suspect product placement as recommendations will become a thing as it won't take as much GPU to give a "recommendation" on "the best widget for X". I firmly expect it to become enshittified the same way google and amazon search has.

      And that's if LLMs don't become commodified.

      2 replies →

Curious when NVIDIA monopoly will ends. China will sure release something that can runs on commodity hardware. I wish they will soon.

I find that hard to believe. The AI companies will want to control what's possible and find new things to do that "need" their services. Otherwise it would be like Intel and Microsoft had decided in the year 2000 that computers are "good enough" now and we would have explored what's possible with that hardware ever since.

  • > Otherwise it would be like Intel and Microsoft had decided in the year 2000 that computers are "good enough" now and we would have explored what's possible with that hardware ever since.

    I think you've misunderstood what good enough means in the context - which is a model capable of completing the tasks assigned to it without having the breadth of full generalization. Your analogy breaks down because of this - we did get 'good enough' spec profiles for different hardware. That thing you're wearing on your wrist won't have the same specifications as the box you use to play games.

    • I think you've misunderstood the analogy. Just ignore it, analogies mostly break down anyways.

      > a model capable of completing the tasks assigned to it

      The thing is, the "task assigned to it" is changing with improved capabilities. If everyone around you in 2036 is using general AI to do amazing stuff, you will probably have little interest in vibe coding slop like it's 2026.

      2 replies →

  • >Otherwise it would be like Intel and Microsoft had decided in the year 2000 that computers are "good enough" now and we would have explored what's possible with that hardware ever since.

    That would be the dream... no fucking Electron! No lockdown modules.

  • > The AI companies will want to control what's possible and find new things to do that "need" their services.

    That's correct. The problem is they have smart people, tons of money, and several years to figure that out, and the best thing they can come up is a coding agent.

Not saying this isn't the case, but my Anthropic subscription costs me less than the electricity would to power such a home inference system.

Gamers Nexus has a good video on this, but if NVIDIA exits the consumer market, and honestly why would they stay when they can charge up to a 100x for the same wafer space for enterprise, AMD would likely do the same. Only Apple really makes consumer hardware suitable for running things locally then, and maybe some weird Qualcomm ARM chip for Windows. It will be hard running things locally if nobody is supplying the hardware.