Comment by EGreg
18 days ago
You’re not wrong
Using an LLM and caching eg FAQs can save a lot of token credits
AI is basically solving a search problem and the models are just approximations of the data - like linear regression or fourier transforms.
The training is basically your precalculation. The key is that it precalculates a model with billions of parameters, not overfitting with an exact random set of answers hehe
> Using an LLM and caching eg FAQs can save a lot of token credits
Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?
No, why would they. You are supposed to maintain that cache.
What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?