Comment by zoogeny
7 hours ago
As an aside, I built a tool to manage my own chat interface over the provider APIs. I added caching because the savings are quite significant and I have a little countdown timer that shows me how much time remaining until the cache is expired.
However, for the basic turn-based conversation the cache (at 5 minutes) is almost always insufficient. By the time I read the LLM response, consider my next question, write it out, etc. I frequently miss the cache.
I imagine it is much more useful if you have a tool that has a common prefix (like a system instruction, tool specs or common set of context across many users).
If you can get it to work frequently enough the savings are quite worth it.
No comments yet
Contribute on Hacker News ↗