Comment by Balinares
3 days ago
Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is significantly faster than generating them.
3 days ago
Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is significantly faster than generating them.
No comments yet
Contribute on Hacker News ↗