Comment by mh-
6 days ago
Yes. And batched inference is a thing, where intelligent grouping/bin packing and routing of requests happens. I expect a good amount of "secret sauce" is at this layer.
Here's an entry-level link I found quickly on Google, OP: https://medium.com/@wearegap/a-brief-introduction-to-optimiz...
No comments yet
Contribute on Hacker News ↗