Comment by dvt

18 days ago

So weird/cool/interesting/cyberpunk that we have stuff like this in the year of our Lord 2026:

   ├── MEMORY.md            # Long-term knowledge (auto-loaded each session)
   ├── HEARTBEAT.md         # Autonomous task queue
   ├── SOUL.md              # Personality and behavioral guidance

Say what you will, but AI really does feel like living in the future. As far as the project is concerned, pretty neat, but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.

I do think that local-first will end up being the future long-term though. I built something similar last year (unreleased) also in Rust, but it was also running the model locally (you can see how slow/fast it is here[1], keeping in mind I have a 3080Ti and was running Mistral-Instruct).

I need to re-visit this project and release it, but building in the context of the OS is pretty mindblowing, so kudos to you. I think that the paradigm of how we interact with our devices will fundamentally shift in the next 5-10 years.

[1] https://www.youtube.com/watch?v=tRrKQl0kzvQ

57 comments

dvt

halJordan 18 days ago

You absolutely do not have to use a third party llm. You can point it to any openai/anthropic compatible endpoint. It can even be on localhost.

dvt 18 days ago
Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).
- mirekrusin 18 days ago
  
  In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.
- embedding-shape 18 days ago
  
  Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.
  
  5 replies →

backscratches 18 days ago

Yes this is not local first, the name is bad.

outofpaper 18 days ago
Horrible. Just because you have code that runs not in a browser doesn't mean you have something that's local. This goes double when the code requires API calls. Your net goes down and this stuff does nothing.
- jdejean 17 days ago
  
  For a web developer local-first only describes where the state of the program lives. In the case of this app that’s in local files. If anthropics api was down you would just use something else. Something like OpenRouter would support model fallbacks out of the box
- konart 18 days ago
  
  Not to mention that you can actually have something that IS local AND runs in a browser :D
- yusuf288 18 days ago
  
  In a world where IT doesn't mean anything, crypto doesn't mean anything, AI doesn't mean anything, AGI doesn't mean anything, End-to-end encryption doesn't mean anything, why should local-first mean anything? We must unite against the tyranny of distinction.
K0balt 18 days ago
It absolutely can be pointed to any standard endpoint, either cloud or local.
It’s far better for most users to be able to specify an inference server (even on localhost in some cases) because the ecosystem of specialized inference servers and models is a constantly evolving target.
If you write this kind of software, you will not only be reinventing the wheel but also probably disadvantaging your users if you try to integrate your own inference engine instead of focusing on your agentic tooling. Ollama, vllm, hugging face, and others are devoting their focus to the servers, there is no reason to sacrifice the front end tooling effort to duplicate their work.
Besides that, most users will not be able to run the better models on their daily driver, and will have a separate machine for inference or be running inference in private or rented cloud, or even over public API.
- backscratches 18 days ago
  
  It is not local first. Local is not the primary use case. The name is misleading to the point I almost didn't click because I do not run local models.
  
  4 replies →
lxgr 18 days ago
To be precise, it’s exactly as local first as OpenClaw (i.e. probably not unless you have an unusually powerful GPU).
- backscratches 18 days ago
  
  Yes but OpenClaw (which is a terrible name for other reasons) doesn't have "local" in the name and so is not misleading.
  
  7 replies →
ciaranmca 18 days ago

Confused me at first as when I saw mention of local + the single file thing in the GitHub I assumed they were going to have llamafile bundled and went looking through to see what model they were using by default.

atmanactive 18 days ago

> but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.

See here:

https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...

nodesocket 18 days ago
What reasonable comparable model can be run locally on say 16GB of video memory compared to Opus 4.6? As far as I know Kimi (while good) needs serious GPUs GTX 6000 Ada minimum. More likely H100 or H200.
- berkes 18 days ago
  
  Devstral¹ has very good models that can be run locally.
  They are in the top of open models, and surpass some closed models.
  I've been using devstral, codestral and Le Chat exclusively for three months now. All from misteals hosted versions. Agentic, as completion and for day-to-day stuff. It's not perfect, but neither is any other model or product, so good enough for me. Less anecdotal are the various benchmarks that put them surprisingly high in the rankings
  ¹https://mistral.ai/news/devstral
- mixermachine 18 days ago
  
  Nothing will come close to Opus 4.6 here. You will be able to fit a destilled 20B to 30B model on your GPU. Gpt-oss-20B is quite good in my testing locally on a Macbook Pro M2 Pro 32GB.
  The bigger downside, when you compare it to Opus or any other hosted model, is the limited context. You might be able to achieve around 30k. Hosted models often have 128k or more. Opus 4.6 has 200k as its standard and 1M in api beta mode.
  
  1 reply →
- lodovic 18 days ago
  
  I made something similar to this project, and tested it against a few 3B and 8B models (Qwen and Ministral, both the instruction and the reasoning variants). I was pleasantly surprised by how fast and accurate these small models have become. I can ask it things like "check out this repo and build it", and with a Ralph strategy eventually it will succeed, despite the small context size.
- PeterStuer 18 days ago
  
  Nothing close to Opus is available in open weights. That said, do all your tasks need the power of Opus?
  
  2 replies →

__mharrison__ 18 days ago

I'm playing with local first openclaw and qwen3 coder next running on my LAN. Just starting out but it looks promising.

bluerooibos 18 days ago
On what sort of hardware/RAM? I've been trying ollama and opencode with various local models on a 16Gb RAM, but the speed, and accuracy/behaviour just isn't good enough yet.
- __mharrison__ 17 days ago
  
  DGX Spark (128gb)

fy20 18 days ago

> Say what you will, but AI really does feel like living in the future.

Love or hate it, the amount of money being put into AI really is our generation's equivalent of the Apollo program. Over the next few years there are over 100 gigawatt scale data centres planned to come online.

At least it's a better use than money going into the military industry.

T-A 18 days ago

The Apollo program was peanuts in comparison:
https://www.wsj.com/tech/ai/ai-spending-tech-companies-compa...
https://www.reuters.com/graphics/USA-ECONOMY/AI-INVESTMENT/g...
jazzyjackson 18 days ago

What makes you think AI investment isn't a proxy for military advantage? Did you miss the saber rattling of anti-regulation lobbying, that we cannot pause or blink or apply rules to the AI industry because then China would overtake us?
adammarples 18 days ago

You know they will never come on line. A lot of it is letters of intention to invest with nothing promised, mostly to juice the circular share price circuils.
ryan_n 18 days ago

Most of these AI companies are part of the military industry. So the money is still going there at the end of the day.
pwndByDeath 18 days ago

LoL, don't worry they are getting their dose of the snakeoil too

jazzyjackson 18 days ago

IMHO it doesn't make sense, financially and resource wise to run local, given the 5 figure upfront costs to get an LLM running slower than I can get for 20 USD/m.

If I'm running a business and have some number of employees to make use of it, and confidentiality is worth something, sure, but am I really going to rely on anything less then the frontier models for automating critical tasks? Or roll my own on prem IT to support it when Amazon Bedrock will do it for me?

Sharlin 18 days ago
That’s probably true only as long as subscription prices are kept artificially low. Once the $20 becomes $200 (or the fast-mode inference quotas for cheap subs become unusably small), the equation may change.
- berkes 18 days ago
  
  This field is highly competitive. Much more than I expected it to. I thought the barrier to entry was so high, only big tech could seriously join the race, because of costs, or training data etc.
  But there's fierce competition by new or small players (deepseek, Mistral etc), many even open source. And Icm convinced they'll keep the prices low.
  A company like openai can only increase subscriptions x10 when they've locked in enough clients, have a monopoly or oligopoly, or their switching costs are multitudes of that.
  So currently the irony seems to be that the larger the AI company, the more loss they're running at. Size seems to have a negative impact on business. But the smaller operators also prevent companies from raising prices to levels at which they make money.
  
  1 reply →
zozbot234 18 days ago

It starts making a lot of sense if you can run the AI workloads overnight on leaner infrastructure rather than insist on real-time response.
zipy124 18 days ago

The usage limits on most 20 USD/month subs are becoming quite restrictive though. API pricing is more indicative of true cost.

croes 18 days ago

> but AI really does feel like living in the future.

Got the same feeling when I put on the Hololens for the first time but look what we have now.

mycall 18 days ago

What does ANTHROPIC bring to this project that a local LLM cannot, e.g. Gwen3 Coder Next?

IhateAI 18 days ago

local first is not the future, lmfao, maybe in 10-20 years. It currently cost ~80k-100k to run a pretty meh Kimi 2.5 at decent tok p/s, which is rather useless anyways. And that doesn't allow you to run any multi-agent sessions.

By time hardware costs shrink to allow you to run useful models, concurrently in multi agent environments, they'll have already devalued labor on a scale never before seen.The layoffs and labor will cause us all to work for morsels, on whatever work opportunities remain. Eventually you'll beg to fight in a war.

LLMs are only here to attack labor, devalue the working class and eventually make us useless to the ruling class. LLMs do not create opportunities/jobs, they replace the inputs to labor, humans. That's their only purpose.

But I guess most llm-kiddies think they're going to vibe code their way out of the working class with Anthropic's latest slop offering. Good luck with that. In 5 years your labor will be worth a 1/4 maybe 1/2 of what it is now, and that vibe coded startup of yours will have been made 5000x times over by every other delusional llm-kiddie.

Have fun with your GPU, you won't be able to afford a 60 series, if they even make one, and it certainly won't be powerful enough to pull you out of the black mirror episode we're heading towards.

I recommend learning and not frying your brain with "Think for me Saas", and not being dependent on Meta or Alibaba open sourcing some model that allows you to compete with them.