Comment by dvdkon

4 hours ago

As far as I know, speculative decoding still verifies that the proposed tokens are what the "big" model would generate, it just uses the guesses to make that process faster. Setting the probability threshold too low then shouldn't affect correctness, just speed (time will be wasted verifying bad guesses).

1 comment

dvdkon

lreeves 3 hours ago

But won't setting it to accept 100% of the proposed tokens will skip the verification?