← Back to context

Comment by thestructuralme

1 month ago

A lot of the scary numbers come from agents being left in “always-on” loops: long context windows, tool calls, retries, and idle GPU time between steps. The right unit isn’t “watts per agent” but something like joules per accepted change (or per useful decision), because an agent that burns 10x energy but replaces 20 minutes of human iteration can still be a net win. What I’d love to see is a breakdown by (1) model/token cost, (2) orchestration overhead (retries, evaluation, tool latency), and (3) utilization (how much time the GPU is actually doing work vs waiting). That’s where the real waste usually hides.