Comment by mvdtnz

2 days ago

Why do you mean by this? What cache?

Generally speaking, there's prompt caching that can be enabled in the API with things like this: https://platform.claude.com/docs/en/build-with-claude/prompt...

For a specific harness, they've all found ways to optimize to get higher cache hit rates with their harness. Common system prompts and all, and more and more users hitting cache really makes the cost of inference go down dramatically.

What bothers me about a lot of the discussion about providers disallowing other harnesses with the subscription plans around here is the complete lack of awareness of how economies of scale from common caching practices across more users can enable the higher, cheaper quotas subscriptions give you.

  • I feel like a lot of this would go away if they made a different API for the “only for use with our client” subscriptions. A different API from the generic one, that moved some of their client behaviors up to the server seems like it would solve all this. People would still reverse engineer to use it in other tools but it would be less useful (due to the forced scaffolding instead of entirely generic completions API) and also ease the burden on their inference compute.

    I’m sure they went with reusing the generic completions API to iterate faster and make it easier to support both subscription and pay-per-token users in the same client, but it feels like they’re burning trust/goodwill when a technical solution could at least be attempted.

    • > I feel like a lot of this would go away if they made a different API for the “only for use with our client” subscriptions.

      They literally did exactly that. That's what being cut off (Antigravity access, i.e. private "only for use with our client" subscription - not the whole account, btw.) for people who do "reverse engineer to use it in other tools".

      Nothing here is new or surprising, the problem has been the same since Anthropic released Claude Code and the Max subscriptions - first thing people did then was trying to auth regular use with Claude Code tokens, so they don't have to pay the API prices they were supposed to.

      1 reply →