Comment by simonw
3 days ago
It's a new, dangerous and wildly popular shape of what I've in the past called a "personal digital assistant" - usually while writing about how hard it is to secure them from prompt injection attacks.
The term is in the process of being defined right now, but I think the key characteristics may be:
- Used by an individual. People have their own Claw (or Claws).
- Has access to a terminal that lets it write code and run tools.
- Can be prompted via various chat app integrations.
- Ability to run things on a schedule (it can edit its own frontal equivalent)
- Probably has access to the user's private data from various sources - calendars, email, files etc. very lethal trifecta.
Claws often run directly on consumer hardware, but that's not a requirement - you can host them on a VPS or pay someone to host them for you too (a brand new market.)
Any suggestions for a specific claw to run? I tried OpenClaw in Docker (with the help of your blog post, thanks) but found it way too wasteful on tokens/expensive. Apparently there's a ton of tweaks to reduce spent by doing things like offloading heartbeat to a local Ollama model, but was looking for something more... put together/already thought through.
The pattern I found that works ,use a small local model (llama 3b via Ollama, takes only about 2GB) for heartbeat checks — it just needs to answer 'is there anything urgent?' which is a yes/no classification task, not a frontier reasoning task. Reserve the expensive model for actual work. Done right, it can cut token spend by maybe 75% in practice without meaningfully degrading the heartbeat quality. The tricky part is the routing logic — deciding which calls go to the cheap model and which actually need the real one. It can be a doozy — I've done this with three lobsters, let me know if you have any questions.
Maybe I’m out of touch but why do you need an LLM to decide if there’s any work to be done? Can’t it just queue or schedule tasks? We already have technology for that that doesn’t require an LLM.
2 replies →
It seems to me like it would be a rather useful exercise to have the smaller model make the routing decision, and below certain confidence thresholds, it sends it to a larger model anyways. Then have the larger model evaluate that choice and perhaps refine instructions.
1 reply →
> but found it way too wasteful on tokens/expensive
I fear this is intrinsic to its architecture. Even if you use smaller models for regular operational tasks (checking heartbeat), you'll inevitably need to promote back to bigger models to do anything useful, and the whole idea of openclaw is that it can do many useful things for you, autonomously. I think that means it's going to burn a lot of tokens if you're using it as intended.
This is presumably also why the default model mode is to try and oauth its way into coding agent harnesses instead of using lab API's?
Last night, I was able to modify nanoclaw, which runs in a container, to use iMessage(instead of whatsapp ) and use GPT-OSS-120B(instead of Claude) hosted on a Nvidia spark running llama.cpp.
It works but a bit slow when asking for web based info. Took a couple of minutes to return a stock price closing value. Trying it again this morning returned an answer in a couple of seconds so perhaps that was just a network blip.
It did get confused when scheduling times as the UTC date time was past midnight but my local EST time was before midnight. This caused my test case case of “tomorrow morning at 7am send me the current Olympic county medal count” test to be scheduled a day later. I told it to assume EST timezone and it appeared to work when translating times but not dates.
Based off the gp's comment, I'm going to try building my own with pocket flow and ollama.
I like ADK, it's lower level and more general, so there is a bit you have to do to get a "claw" like experience (not that much) and you get (1) a common framework you can use for other things (2) a lot more places to plug in (3) four SDKs to choose from (ts, go, py, java... so far)
It's a lot more work to build a Copilot alternative (ide integration, cli). I've done a lot of that with adk-go, https://github.com/hofstadter-io/hof
Just use Google flash for heartbeats
[dead]
I spent a few days running openclaw on a VPS, and it was painful and frustrating:
- no graphics subsystem makes things harder
- VPS IP subnets are often blocked by default by numerous websites and WAFs
- can't easily see what it's doing
Running it on its own PC is definitely the golden path for the way it's architected.
> Running it on its own PC is definitely the golden path for the way it's architected.
Not really familiar with the architecture, but would it be possible to run it on a not so powerful laptop in a "client" mode, where it would query a LLM that is running on a more beefy desktop?