Comment by vhiremath4

19 hours ago

So this is like branch prediction for operating systems? Except we have probability baked into the model itself so it’s even more reliable.

3 comments

vhiremath4

Lihh27 17 hours ago

similar idea, but the failure mode is better. a branch mispredict burns cycles. a bad guess here usually just means no bonus tokens. https://arxiv.org/abs/2211.17192

TOMDM 14 hours ago

As long as you're not bound on parallelism or bandwidth then it's "free", but if you're constrained on either resource then your lighter predictor model just needs to save you more cycles than it congests on average.
dchftcs 9 hours ago

A bad guess still costs cycles, but the penalty is smaller compared to branch mispredict in the current state. But if we have some kind of pipelining, like if we have something that assumed the speculative decode is correct, then it'll be expensive again.