Comment by neals

1 day ago

I'm curious how hardware and power cost would stack up to subscription cost

Right now - there's some heavily subsidized subscriptions that are more or less cheating. For instance, Github CoPilot at $39/month gives you claude opus 4.6. They're going to close that off, but right now it's like a freebie for those doing API agentic harnesses.

That said, if you are doing always on agents and you spend $3k-$4k on a GB10 or, $5+ k on Apple Silicon as your sunk cost, you will probably come out ahead.

I've got 5 agents running a purely experimental social experiment. AThey operate in an evennia mud (a familiar sounding city called "gothmud). I've built a channel, idle prompts, sleep schedule. I feed in real world news, weather. There's a character up in a clock tower that reads evennia's audit logs every 20 minutes to surveil the city, and a cast of people wandering around, investigating things, having coffee, repairing robots. This is all hitting qwen3.6-35-A3B on the Asus GB10, which cost me $3k.

Over the last 30 days, I've hit 394M input tokens, 1.6B output tokens. I would have spent between $1600 to $1700 if I was using openrouter. Not calculated - I also have comfyui running in the spare space, and the agents "take photos" of the rooms they're in, selfies, workshop photos, etc.

How much did I spend on electricity? I don't have a meter on my box. My total electric bill for the last 30 days was $220, so I know it's less than that. My rate to compare is 11.7/kwh, but it's closer to 15c/Kwh total. The Asus GX10 has a 240W power supply, and it's probably only pulling 180. I estimate $15-$20/month. But worst case red-lining. 240 Watts, 720 hours = 172KWH , and at $0.20, I come to $35

Here's the kicker thought - that github copilot subscription I mentioned? I have another agent running on that, reading all my other agent logs, managing my obsidian notes, doing research, sending briefings. And all by itself, it used almost the same amount of claude-opus tokens for that $39/month subscription. I was actually a bit shocked when I pulled a recent report and saw that. I'm working to migrate functionality away from copilot subscription to the local model. A lot of the initial setup might have needed it, but not the ongoing review style work it does.

  • Have you learned anything interesting from your agent ant farm?

    • A few things. I replied to someone else above, but I feed lessons learned from my social ant farm agents back into more productive agents.

      Memory recall:

      Lots of systems out there to give agents memory. I've used a bunch and written a couple. Storing memories is easy, but getting an agent to recall them, no matter how much you mention it in your AGENT/CLAUDE.md is a bit of an uphill battle. I've even watched claude make useful project memories and never refer to them again.

      In my agent ant farm - agents go "to sleep" at night. They get nudged to head home, once there they get prompted to make notes about their day, about other characters. Then we do a compact with custom instructions. After compact/sleep cycle, if they enter a room with one of the characters in their notes, that gets loaded back into context automatically.

      That all boils down to hooks in Pi like before_agent_turn. You can intercept a prompt, check it against code/flat files, and smartly inject more information into context. You can have a long running main session with compacts that discard procedural bits and offload the rest to memory.

      Time Awareness:

      Agents have no concept of time. You can send them a message at 5am, then at 10pm, and it's been 2 turns for them. For coding, this is fine. But for assistant level stuff, adding a message like "It's 3PM. It has been 3 hours since the last interaction with the user" goes a long way. Without me saying something like "new topic", it knows now that time has passed, i'm probably onto something new. If I left something hanging, it will remind me about it, or maybe go check on things that should have happened during the day.

      Inner Thoughts/Idle nudges:

      I can have an extension run every 5 seconds, check a a schedule, check activity level of the main session and fire off nudges on the main session. These look like the user sent it, but I generally prefix it with [inner thought]. For my social bot, I tested this along the lines of "[inner thought] it's been 3 hours since you last talked with user, why not reach out, let him know what's new, maybe send a selfie or a photo of where you are". For my assistant bot, it's an 8am, 3pm, and 7pm nudge along the lines of "[inner thought] put together an activity report of work things that has changed since the last report". This all runs in the main context, they get the thought, have historical context, can run skill to check on vault updates, open beads, anything observed from ingesting other agent sessions, and sends me a summary. It take into account my idle factor. If I'm heavily engaged in conversation at 3PM, the report might get delayed 15 minutes or an hour, or skipped altogether.

      1 reply →

  • > experiment

    What is the experiment? What are you hoping to learn from all this?

    Or do you just mean you've made a dynamic dollhouse that you think is cool? The Sims on your own terms?

    • A little of A, a little of B. I have a lot of fun building it out, it's surpassed Factorio in addiction, and I've been able to flesh out some patterns that I roll back into more productive agent harness bits.

      For A:

      The learning is in building agent harnesses that aren't just cron jobs reading a file like HEARTBEAT.md. I have some serious tools for my own use. One main assistant/coordinator agent, one SRE/coder agent (with sub-agents of its own).

      I originally just started last year with the AI assistant (Jane from enderverse). Along the way building scheduled systems, hand offs to other agents, etc. As I ran into problems, I'd be rewriting and refactoring. So I spent some time making a low-stake hatbot with history and routines. Instead a from-scratch golang harness, I built it around pi and extensions. Time of day prompt splices (extensions can inject into or modify prompts on the fly, wake up reminders. Things that you do in the main session vs spinning up an ephemeral session. Self improvement daydreaming (modify your own skills and AGENTS.md) A lot of that went back into rebuilding Jane to something more useful for me.

      For B:

      The "dynamic dollhouse" as you put it was seeing where I could take that living chatbot next. There's a lot of projects pointing agents at slack, discord, message boards. I figured why not a mud with rooms, weather, and props. Lots of interesting challenges. How to keep bots from nesting in their own room, how to keep them from yes-anding each other all day long. How to slow down 3 bots talking at each other so a human can get a word in edge-wise.

      Different levels. There's plain old NPCs that have dice roll random responses. There's LLM driven NPCs that only remember the last 5-10 messages. And the main ones are bot agents. Full agent harness, moving around the environment. Long lived context windows. One character (a nurse at the hospital) gets into arguments with an NPC receptionist that treats her as another patient. Complains about it to other characters, they remember and the word spreads.

      The agents get prompted to write down notes, the head home for sleep (and session compaction). Next time they enter a room with that person after compact, their notes get loaded automatically. This kind of behavior can feed back into the more productivity based agents.

      1 reply →

For open models, usually not well. You get 5+ providers competing on cost, all with cheaper electricity and better hardware utilization than your local setup

I did an estimate of that if you're interested: https://x.com/pwnies/status/2028831699736637912

The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.

The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.

  • You're not counting the capex which could be the same cost as 5-10 years of Claude.

    • This assume Claude's price doesn't change. Which isn't a great assumption considering inference providers are moving to usage based billing. Also the VC money isn't going to last indefinitely. Current inference providers are being subsidized with VC money at this point.

      1 reply →

  • Is latency of the network that noticeable? Aren’t we talking low hundreds of ms at worst here? Much lower for something close regionally.