Comment by rohansood15
16 hours ago
Are you comparing single-user requests or multiple concurrent requests when you say comparable to rented GPU? Most of the cost efficiencies kick in with concurrent/batch requests. A single H100 node can provide like 5k input + 2k output tok/s on a model like Qwen 3.6 35B-A3B with 30+ concurrent requests.
No comments yet
Contribute on Hacker News ↗