Comment by walterbell

18 days ago

> Using an LLM and caching eg FAQs can save a lot of token credits

Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?

No, why would they. You are supposed to maintain that cache.

What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?