Comment by Deathmax
18 hours ago
Vertex's offering of Gemini very much does implicit caching, and has always been the case [1]. The recent addition of applying implicit cache hit discounts also works on Vertex, as long as you don't use the `global` endpoint and hit one of the regional endpoints.
[1]: http://web.archive.org/web/20240517173258/https://cloud.goog..., "By default Google caches a customer's inputs and outputs for Gemini models to accelerate responses to subsequent prompts from the customer. Cached contents are stored for up to 24 hours."
No comments yet
Contribute on Hacker News ↗