Comment by storystarling

1 month ago

That implies a throughput of around 16 million tokens per second. Since coding agent loops are inherently sequential—you have to wait for the inference to finish before the next step—that volume seems architecturally impossible. You're bound by latency, not just cost.

4 comments

storystarling

mrob 1 month ago

The original post claimed they were "running hundreds of concurrent agents":

https://cursor.com/blog/scaling-agents

simonw 1 month ago
It was 2,000 concurrent agents at peak.
I'd still be surprised if that added up to "trillions" of tokens. A trillion is a very big number.
- mikkupikku 1 month ago
  
  16 million a second across 2000 agents would be 8000 tokens per second per agent. This doesn't seem right to me.
- Snuggly73 1 month ago
  
  I mean, its right there in their blog - https://cursor.com/blog/scaling-agents
  "We've deployed trillions of tokens across these agents toward a single goal. The system isn't perfectly efficient, but it's far more effective than we expected."