Comment by manmal

1 year ago

The top comment in this thread mentions a 6k setup, which likely could be used with vLLM with more tinkering. AFAIK vLLM‘s batched inference is great.

0 comments