Comment by Octoth0rpe
19 hours ago
> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s)
There seems to be at least some detail on that point.
19 hours ago
> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s)
There seems to be at least some detail on that point.
No comments yet
Contribute on Hacker News ↗