Comment by jjmarr
8 hours ago
You can increase LLM inference throughput by using smaller batch sizes but that scales non-linearly in practice. It probably isn't worth it unless your model provider makes it really easy.
8 hours ago
You can increase LLM inference throughput by using smaller batch sizes but that scales non-linearly in practice. It probably isn't worth it unless your model provider makes it really easy.
No comments yet
Contribute on Hacker News ↗