Comment by chc4
9 hours ago
it just me that thinks its kinda weird that they conflate speed in tokens/second and latency, when i think of latency as time to first token? like it generates an entire paragraph of tokens faster but wouldnt it still be slower if your reply is only 1 word because it has to do the entire 256 tokens as a chunk
No comments yet
Contribute on Hacker News ↗