Comment by Talderigi

5 hours ago

Curious how the semantic caching layer works.. are you embedding requests on the gateway side and doing a vector similarity lookup before proxying? And if so, how do you handle cache invalidation when the underlying model changes or gets updated?

1 comment

Talderigi

giorgi_pro 5 hours ago

Hey, contributor here. That's right, GoModel embeds requests and does vector similarity lookup before proxying. Regarding the cache invalidation, there is no "purging" involved – the model is part of the namespace (params_hash includes the LLM model, path, guardrails hash, etc). TTL takes care of the cleanup later.