Comment by antirez
2 hours ago
Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.
2 hours ago
Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.
No comments yet
Contribute on Hacker News ↗