Comment by cobertos
2 days ago
Any suggestions for a specific claw to run? I tried OpenClaw in Docker (with the help of your blog post, thanks) but found it way too wasteful on tokens/expensive. Apparently there's a ton of tweaks to reduce spent by doing things like offloading heartbeat to a local Ollama model, but was looking for something more... put together/already thought through.
> but found it way too wasteful on tokens/expensive
I fear this is intrinsic to its architecture. Even if you use smaller models for regular operational tasks (checking heartbeat), you'll inevitably need to promote back to bigger models to do anything useful, and the whole idea of openclaw is that it can do many useful things for you, autonomously. I think that means it's going to burn a lot of tokens if you're using it as intended.
This is presumably also why the default model mode is to try and oauth its way into coding agent harnesses instead of using lab API's?
The pattern I found that works ,use a small local model (llama 3b via Ollama, takes only about 2GB) for heartbeat checks — it just needs to answer 'is there anything urgent?' which is a yes/no classification task, not a frontier reasoning task. Reserve the expensive model for actual work. Done right, it can cut token spend by maybe 75% in practice without meaningfully degrading the heartbeat quality. The tricky part is the routing logic — deciding which calls go to the cheap model and which actually need the real one. It can be a doozy — I've done this with three lobsters, let me know if you have any questions.
It seems to me like it would be a rather useful exercise to have the smaller model make the routing decision, and below certain confidence thresholds, it sends it to a larger model anyways. Then have the larger model evaluate that choice and perhaps refine instructions.
That's a cleaner implementation than what I described. Small model as meta-router: classify locally, escalate only when confidence is low. The self-evaluation loop you're suggesting would add a quality layer without much overhead — the large model's judgment of its own routing is itself a useful signal. Haven't shipped that yet but it's on the list.
Maybe I’m out of touch but why do you need an LLM to decide if there’s any work to be done? Can’t it just queue or schedule tasks? We already have technology for that that doesn’t require an LLM.
Tasks might have prerequisites or conditions.
Like "if it's raining, remind me to grab my umbrella before I leave for work"
-> "is it raining?" requires a tool call to a weather service
-> "before I leave for work" needs access to the user's calendar and information when they leave compared to the time their work day starts
-> "remind me" needs a way to communicate to the user in an efficient way, Telegram, iMessage or Whatsapp for example.
Totally valid for fixed, well-defined tasks — a cron job is cheaper and more reliable there. The LLM earns its keep when the heartbeat involves contextual judgment: not just "is there a task in the queue" but "given everything happening right now, what actually matters?" If the agent needs to reason about priority, relevance, or context before deciding what to surface — that's where the local model pulls its weight. If your agents only do fixed tasks, you're totally right, you don't need it!
Last night, I was able to modify nanoclaw, which runs in a container, to use iMessage(instead of whatsapp ) and use GPT-OSS-120B(instead of Claude) hosted on a Nvidia spark running llama.cpp.
It works but a bit slow when asking for web based info. Took a couple of minutes to return a stock price closing value. Trying it again this morning returned an answer in a couple of seconds so perhaps that was just a network blip.
It did get confused when scheduling times as the UTC date time was past midnight but my local EST time was before midnight. This caused my test case case of “tomorrow morning at 7am send me the current Olympic county medal count” test to be scheduled a day later. I told it to assume EST timezone and it appeared to work when translating times but not dates.
Based off the gp's comment, I'm going to try building my own with pocket flow and ollama.
I like ADK, it's lower level and more general, so there is a bit you have to do to get a "claw" like experience (not that much) and you get (1) a common framework you can use for other things (2) a lot more places to plug in (3) four SDKs to choose from (ts, go, py, java... so far)
It's a lot more work to build a Copilot alternative (ide integration, cli). I've done a lot of that with adk-go, https://github.com/hofstadter-io/hof
Just use Google flash for heartbeats
[dead]