← Back to context

Comment by GaggiX

16 hours ago

>to today's subsidized pricing, which they can't keep up forever.

The APIs are not subsidized, they probably have quite the large margin actually: https://lmsys.org/blog/2025-05-05-large-scale-ep/

>Why would you pay OpenAI when you can host your own hyper efficient Chinese model

The 48GB of VRAM or unified memory required to run this model at 4bits is not free either.

I didn't say its free but it is about 90% cheaper. Sonnet is $15 per million token output, this just dropped and is available at OpenRouter at $1.40. Even compared to Gemini Flash which is probably the best price-to-performance API is generally ranked lower than Qwen's models and is $2.50 so still %44 cheaper.