Comment by woah
1 day ago
From my back of the envelope analysis for my own projects, paying per token on OpenRouter is competitive if not cheaper than running the same open weight model on a rented GPU. Per-token pricing is in the same ballpark (although more expensive) for closed frontier models and open weight models (cents to dollars per million). To me this says that the pricing is somewhat grounded in reality.
Are you comparing single-user requests or multiple concurrent requests when you say comparable to rented GPU? Most of the cost efficiencies kick in with concurrent/batch requests. A single H100 node can provide like 5k input + 2k output tok/s on a model like Qwen 3.6 35B-A3B with 30+ concurrent requests.