Comment by jjmarr

8 hours ago

You can increase LLM inference throughput by using smaller batch sizes but that scales non-linearly in practice. It probably isn't worth it unless your model provider makes it really easy.

0 comments