Comment by monster_truck
12 days ago
Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss
No comments yet
Contribute on Hacker News ↗