Comment by zozbot234
7 hours ago
10 tok/s is quite fine for chatting, though less so for interaction with agentic workloads. So the technique itself is still worthwhile for running a huge model locally.
7 hours ago
10 tok/s is quite fine for chatting, though less so for interaction with agentic workloads. So the technique itself is still worthwhile for running a huge model locally.
No comments yet
Contribute on Hacker News ↗