Comment by diggan
1 day ago
> despite clear demand for that from developers
Theorizing about why that is: Could it be possible they can't do deterministic inference and batching at the same time, so the reason we see them avoiding that is because that'd require them to stop batching which would shoot up costs?
No comments yet
Contribute on Hacker News ↗