Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by rohansood15

16 hours ago

Are you comparing single-user requests or multiple concurrent requests when you say comparable to rented GPU? Most of the cost efficiencies kick in with concurrent/batch requests. A single H100 node can provide like 5k input + 2k output tok/s on a model like Qwen 3.6 35B-A3B with 30+ concurrent requests.

0 comments

rohansood15

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities