Comment by lambda
3 hours ago
That sounds high for a Strix Halo with a dense 27b model. Are you talking about decode (prompt eval, which can happend in parallel) or generation when you quote tokens per second? Usually if people quote only one number they're quoting generation speed, and I would be surprised if you got that for generation speed on a Strix Halo.
No comments yet
Contribute on Hacker News ↗