Comment by yencabulator

1 year ago

And now you need a server per model? Ollama loads models on-demand, and terminates them after idle, all accessible over the same HTTP API.

0 comments