Comment by Kuinox

8 months ago

Prompt caching would lower the cost, later similar tech would lower the inference cost too. You have less than 25 tokens, thats between 1-5$.

There may be some use case but I'm not convinced with the one you gave.

1 comment

Kuinox

minimaxir 8 months ago

So there's a bit of an issue with prompt caching implementations: for both OpenAI API and Claude's API, you need a minimum of 1024 tokens to build the cache for whatever reason. For simple problems, that can be hard to hit and may require padding the system prompt a bit.