Comment by dvt

18 days ago

So weird/cool/interesting/cyberpunk that we have stuff like this in the year of our Lord 2026:

   ├── MEMORY.md            # Long-term knowledge (auto-loaded each session)
   ├── HEARTBEAT.md         # Autonomous task queue
   ├── SOUL.md              # Personality and behavioral guidance

Say what you will, but AI really does feel like living in the future. As far as the project is concerned, pretty neat, but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.

I do think that local-first will end up being the future long-term though. I built something similar last year (unreleased) also in Rust, but it was also running the model locally (you can see how slow/fast it is here[1], keeping in mind I have a 3080Ti and was running Mistral-Instruct).

I need to re-visit this project and release it, but building in the context of the OS is pretty mindblowing, so kudos to you. I think that the paradigm of how we interact with our devices will fundamentally shift in the next 5-10 years.

[1] https://www.youtube.com/watch?v=tRrKQl0kzvQ

You absolutely do not have to use a third party llm. You can point it to any openai/anthropic compatible endpoint. It can even be on localhost.

  • Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).

    • In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.

    • Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.

      5 replies →

Yes this is not local first, the name is bad.

  • Horrible. Just because you have code that runs not in a browser doesn't mean you have something that's local. This goes double when the code requires API calls. Your net goes down and this stuff does nothing.

    • For a web developer local-first only describes where the state of the program lives. In the case of this app that’s in local files. If anthropics api was down you would just use something else. Something like OpenRouter would support model fallbacks out of the box

    • Not to mention that you can actually have something that IS local AND runs in a browser :D

    • In a world where IT doesn't mean anything, crypto doesn't mean anything, AI doesn't mean anything, AGI doesn't mean anything, End-to-end encryption doesn't mean anything, why should local-first mean anything? We must unite against the tyranny of distinction.

  • It absolutely can be pointed to any standard endpoint, either cloud or local.

    It’s far better for most users to be able to specify an inference server (even on localhost in some cases) because the ecosystem of specialized inference servers and models is a constantly evolving target.

    If you write this kind of software, you will not only be reinventing the wheel but also probably disadvantaging your users if you try to integrate your own inference engine instead of focusing on your agentic tooling. Ollama, vllm, hugging face, and others are devoting their focus to the servers, there is no reason to sacrifice the front end tooling effort to duplicate their work.

    Besides that, most users will not be able to run the better models on their daily driver, and will have a separate machine for inference or be running inference in private or rented cloud, or even over public API.

  • To be precise, it’s exactly as local first as OpenClaw (i.e. probably not unless you have an unusually powerful GPU).

  • Confused me at first as when I saw mention of local + the single file thing in the GitHub I assumed they were going to have llamafile bundled and went looking through to see what model they were using by default.

> but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.

See here:

https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...

  • What reasonable comparable model can be run locally on say 16GB of video memory compared to Opus 4.6? As far as I know Kimi (while good) needs serious GPUs GTX 6000 Ada minimum. More likely H100 or H200.

    • Devstral¹ has very good models that can be run locally.

      They are in the top of open models, and surpass some closed models.

      I've been using devstral, codestral and Le Chat exclusively for three months now. All from misteals hosted versions. Agentic, as completion and for day-to-day stuff. It's not perfect, but neither is any other model or product, so good enough for me. Less anecdotal are the various benchmarks that put them surprisingly high in the rankings

      ¹https://mistral.ai/news/devstral

    • Nothing will come close to Opus 4.6 here. You will be able to fit a destilled 20B to 30B model on your GPU. Gpt-oss-20B is quite good in my testing locally on a Macbook Pro M2 Pro 32GB.

      The bigger downside, when you compare it to Opus or any other hosted model, is the limited context. You might be able to achieve around 30k. Hosted models often have 128k or more. Opus 4.6 has 200k as its standard and 1M in api beta mode.

      1 reply →

    • I made something similar to this project, and tested it against a few 3B and 8B models (Qwen and Ministral, both the instruction and the reasoning variants). I was pleasantly surprised by how fast and accurate these small models have become. I can ask it things like "check out this repo and build it", and with a Ralph strategy eventually it will succeed, despite the small context size.

> Say what you will, but AI really does feel like living in the future.

Love or hate it, the amount of money being put into AI really is our generation's equivalent of the Apollo program. Over the next few years there are over 100 gigawatt scale data centres planned to come online.

At least it's a better use than money going into the military industry.

IMHO it doesn't make sense, financially and resource wise to run local, given the 5 figure upfront costs to get an LLM running slower than I can get for 20 USD/m.

If I'm running a business and have some number of employees to make use of it, and confidentiality is worth something, sure, but am I really going to rely on anything less then the frontier models for automating critical tasks? Or roll my own on prem IT to support it when Amazon Bedrock will do it for me?

  • That’s probably true only as long as subscription prices are kept artificially low. Once the $20 becomes $200 (or the fast-mode inference quotas for cheap subs become unusably small), the equation may change.

    • This field is highly competitive. Much more than I expected it to. I thought the barrier to entry was so high, only big tech could seriously join the race, because of costs, or training data etc.

      But there's fierce competition by new or small players (deepseek, Mistral etc), many even open source. And Icm convinced they'll keep the prices low.

      A company like openai can only increase subscriptions x10 when they've locked in enough clients, have a monopoly or oligopoly, or their switching costs are multitudes of that.

      So currently the irony seems to be that the larger the AI company, the more loss they're running at. Size seems to have a negative impact on business. But the smaller operators also prevent companies from raising prices to levels at which they make money.

      1 reply →

  • It starts making a lot of sense if you can run the AI workloads overnight on leaner infrastructure rather than insist on real-time response.

  • The usage limits on most 20 USD/month subs are becoming quite restrictive though. API pricing is more indicative of true cost.

> but AI really does feel like living in the future.

Got the same feeling when I put on the Hololens for the first time but look what we have now.

What does ANTHROPIC bring to this project that a local LLM cannot, e.g. Gwen3 Coder Next?

local first is not the future, lmfao, maybe in 10-20 years. It currently cost ~80k-100k to run a pretty meh Kimi 2.5 at decent tok p/s, which is rather useless anyways. And that doesn't allow you to run any multi-agent sessions.

By time hardware costs shrink to allow you to run useful models, concurrently in multi agent environments, they'll have already devalued labor on a scale never before seen.The layoffs and labor will cause us all to work for morsels, on whatever work opportunities remain. Eventually you'll beg to fight in a war.

LLMs are only here to attack labor, devalue the working class and eventually make us useless to the ruling class. LLMs do not create opportunities/jobs, they replace the inputs to labor, humans. That's their only purpose.

But I guess most llm-kiddies think they're going to vibe code their way out of the working class with Anthropic's latest slop offering. Good luck with that. In 5 years your labor will be worth a 1/4 maybe 1/2 of what it is now, and that vibe coded startup of yours will have been made 5000x times over by every other delusional llm-kiddie.

Have fun with your GPU, you won't be able to afford a 60 series, if they even make one, and it certainly won't be powerful enough to pull you out of the black mirror episode we're heading towards.

I recommend learning and not frying your brain with "Think for me Saas", and not being dependent on Meta or Alibaba open sourcing some model that allows you to compete with them.