Comment by antirez
1 hour ago
Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.
1 hour ago
Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.
No comments yet
Contribute on Hacker News ↗