Comment by Traubenfuchs
6 days ago
I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this.
They probably also have cheap code or cheap models that normalize requests to increase cache hit rate.
No comments yet
Contribute on Hacker News ↗