Comment by storystarling

14 days ago

That implies a throughput of around 16 million tokens per second. Since coding agent loops are inherently sequential—you have to wait for the inference to finish before the next step—that volume seems architecturally impossible. You're bound by latency, not just cost.

The original post claimed they were "running hundreds of concurrent agents":

https://cursor.com/blog/scaling-agents

  • It was 2,000 concurrent agents at peak.

    I'd still be surprised if that added up to "trillions" of tokens. A trillion is a very big number.