Comment by Palmik

1 month ago

APIs are usually very profitable. As for subscriptions, it would depend on how many tokens average subscriber uses per month. Do we have some source of info on this?

Some notes:

- # Input tokens & # output tokens per request matters a lot.

- KV Cache hit rate matters a lot.

- vLLM is not the necessarily most efficient engine.

- You are looking at API cost for DeepSeek V3.2, which is much cheaper than DeepSeek R1 / V3 / V3.1. DeepSeek V3.2 is different architecture (sparse attention) that is much more efficient. DeepSeek V3 cheapest option (fp8) tends to be ~$1/mil output tokens while R1 tends to be ~$2.5/mil (note that for example Together AI charges whopping $7/mil output tokens for R1!)

As for the cost: You can also get H200s for ~ $1.6/hr and H100s for ~ $1.2/hr. That somewhat simplifies the calculations :)

Ignoring the caveats and assuming H200s, with their setup you will:

- Process 403200000 input tokens.

- Generate 126720000 output tokens.

- Spend $25.6.

- On Together with DS R1 it would cost you $3 * 403.2 + $7 * 126.7 = ~$2096. Together does not even offer discount for KV cache hits (what a joke :)).

- On NovitaAI with DS R1 it would cost you $0.7 * 403.2 + $2.5 * 126.7 = ~$600 (with perfect cache hit rate, which gives 50% discount on input tokens here, it would be ~$458).

0 comments

Palmik

No comments yet

Contribute on Hacker News ↗