Comment by _neil
10 months ago
A draft model is something that you would explicitly enable. It uses a smaller model to speculatively generate next tokens, in theory speeding up generation.
Here’s the LM Studio docs on it: https://lmstudio.ai/docs/app/advanced/speculative-decoding
No comments yet
Contribute on Hacker News ↗