Comment by rfoo

1 day ago

> It's known that such tricks reduce accuracy

AFAIU, speculative decoding (and this fancier version of spec. decoding) does not reduce accuracy.

3 comments

rfoo

No it shouldn't do. "All" you're doing is having a small model run the prompt and then have the large model "verify" it. When the large model diverges from the small one, you restart the process again.

Der_Einzige 1 day ago

It’s quantization which is crippling accuracy…

petesergeant 15 hours ago

People all over this subthread saying that with no evidence provided. The company say they don’t — which would be pretty embarrassing to have to walk back — so who’s saying they do?