← Back to context

Comment by londons_explore

8 hours ago

The providers must have a more efficient approach. Most cache every request for 12+ hours, and they certainly can't spare 100GB of ram per request for 12 hours.

> 12 hours

have things changed around this recently? I know openai optionally allows 24 hours but thought it was ~1h without that, and anthropic used to quote 5-15 minutes or something.

  • Anthropic is 5 minutes, though you can pay more to get 60 minutes I believe.

    The #1 was to make Claude code token quota go further is to never let the 5 minute cache TTL expire. Either send a new request within the window, or use /clear and copy/paste, or use /clear and a framework that automatically generates session state that gets replayed from files after /clear.

or maybe they don’t actually cache (fully) but lie and just don’t charge the user right now. at least half the users, who are probably also using the most similar tokens / prompts, wouldn’t really know the difference in latency (or care)