← Back to context

Comment by doesnt_know

19 hours ago

How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain?

You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used.

> How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain?

1. Compute scaling with the length of the sequence is applicable to transformer models in general, i.e. every frontier LLM since ChatGPT's initial release.

2. As undocumented changes happen frequently, users should be even more incentivized to at least try to have a basic understanding of the product's cost structure.

> You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used.

I think "internal technical implementation" is a stretch. Users don't need to know what a "transformer" is to understand the trade-off. It's not trivial but it's not something incomprehensible to laypersons.

I use CC, and I understand what caching means.

I have no idea how that works with a LLM implementation nor do I actually know what they are caching in this context.

  • They are caching internal LLM state, which is in the 10s of GB for each session. It's called a KV cache (because the internal state that is cached are the K and V matrices) and it is fundamental to how LLM inference works; it's not some Anthropic-specific design decision. See my other comment for more detail and a reference.

> 99.99% of users won't even understand the words that are being used.

That's a bad estimate. Claude Code is explicitly a developer shaped tool, we're not talking generically ChatGPT here, so my guess is probably closer to 75% of those users do understand what caching is, with maybe 30% being able to explain prompt caching actually is. Of course, those users that don't understand have access to Claude and can have it explain what caching is to them if they're interested.