Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by apitman

7 months ago

How does multi-user work, and how many users could it handle concurrently? My only experience is running much smaller models, and they easily peg my GPU at ~90 tokens/s. So maybe I could run 5-10 users at <10t/s? Does software like llama.cpp and ollama handle this?

0 comments

apitman

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities