Comment by tarruda
10 hours ago
I'm only interested in the local, single user use case. Plus I use a Mac studio for inference, so vLLM is not an option for me.
10 hours ago
I'm only interested in the local, single user use case. Plus I use a Mac studio for inference, so vLLM is not an option for me.
You can get concurrency gains [0] as local/single user (multi-agent) use case with vLLM with your Mac Studio.
[0] https://youtu.be/Ze5XLooTt6g?t=658