Comment by woadwarrior01
10 months ago
That should not be the case. Speculative decoding is trading off compute for memory bandwidth. The model's output is guaranteed to be the same, with or without it. Perhaps there's a bug in the implementation that you're using.
No comments yet
Contribute on Hacker News ↗