Comment by Kuinox
19 hours ago
Prompt caching would lower the cost, later similar tech would lower the inference cost too. You have less than 25 tokens, thats between 1-5$.
There may be some use case but I'm not convinced with the one you gave.
19 hours ago
Prompt caching would lower the cost, later similar tech would lower the inference cost too. You have less than 25 tokens, thats between 1-5$.
There may be some use case but I'm not convinced with the one you gave.
So there's a bit of an issue with prompt caching implementations: for both OpenAI API and Claude's API, you need a minimum of 1024 tokens to build the cache for whatever reason. For simple problems, that can be hard to hit and may require padding the system prompt a bit.