← Back to context

Comment by jauntywundrkind

1 day ago

If you are trying to run lots of Pi or Hermes or whatever corporate whatever agent junk you have, to make a bunch of always on efficient agentic systems available to people, en masse, with low start-up costs, and high efficiency, there's a host of reasons that doesn't matter.

The big obvious central smoking gun that you'll get to in computer science 200 level classes is Amdahl's Law, which states:

> the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used

You queue up some work for an agent. The LLM is going to do a bunch of work over time, and spend 20 minutes crunching on a task. Let's generously say it takes your PC 2 minute of it's CPU time for it to do the tool calls, to run the build, to run tests. If we expand this to 10 minutes to run it on a phone, that's indeed starting to be a big enough difference to notice. But in 99.9999% of cases, I don't think the harness consumes that much CPU and I don't think the growth factor is 5x to move to phone, and even if it did, it's still only an increase from 22 to 30 minutes: it's an async job either way, and the time budget is not dominated by the phone or PC running the harness.

Ideally yes, there's some intelligence to see: oh, we are about do to a build. Send the build to the build server, that's a 384 core 1U with terabytes of memory bandwidth and let it do that. But most work is not like running builds and tests. The harness doesn't need that. We need some small local computers cheap that we can have lots of running.

Model performance might radically improve in time, and that might change the Amdahl's Law calculations here. If you're paying for Turbo or Plaid or whatever, yeah, you maybe have the money to spend on a better harness too. I'd say that ideally these workloads become live migrate-able, that we can CRIU checkpoint/restore them across systems, ideally, anytime, so that we can give performance people performance when it actually counts, like the build concern above, when the agent is fast. LLM's built for speed like LFM2.5-8B-A1B (DiffuseGemini feels unlikely as it's fast, but low concurrency, but perhaps?), double the speed of many models, so that 20 minutes could become significantly less. But right now it feels like we need a lot of cheap not-performance critical harnesses that can sit around running, and that performance for them is not critical. https://news.ycombinator.com/item?id=47675213