Comment by jjmarr

6 hours ago

You can increase LLM inference throughput by using smaller batch sizes but that scales non-linearly in practice. It probably isn't worth it unless your model provider makes it really easy.

0 comments