Comment by Galanwe
1 month ago
My advice: don't just look at tokens per second, but also at time to first token (TTFT).
The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.
1 month ago
My advice: don't just look at tokens per second, but also at time to first token (TTFT).
The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.
No comments yet
Contribute on Hacker News ↗