Comment by ankit219
1 month ago
This was always against the terms of service. Took too long to crack down on it.
There seems to be a lot of FUD on how anthropic prices it. but i think many devs and harness builders are intentionally misrespresenting things to build pressure.
API capacity has specific SLAs and the infra is for peak capacity expectation. API demand is notoriously spiky, especially when you don't know who is going to be using it via the third party. At any given time you would have 20%-30% utilization, given the api is mostly used by tools like Cursor who have a system of routing by themselves. For subs, they can use the unused capacity (with high priority to API requests). The marginal cost of a query is not as high especially given kv cache and continuous batching (and with new parallelism techniques). With subs, the number is expected, the concurrency can be modelled, and hence the price per token is lower.
With third party APIs reverse engineering that, it breaks the whole thing in two ways - one more pressure via subs. Claude code rarely parallelizes requests while Cline does it every time (for example). It also cancels out the peak capacity estimation. In a way you could argue that same users doing it via subs woudl have been doing it via api too, but given the pricing differences, not the same. This ends up affecting people legitimately using the subscription.
No comments yet
Contribute on Hacker News ↗