Comment by sorenjan
8 hours ago
Doesn't "real" distillation use the logits instead of the final tokens? I would classify this more like using a model to generate synthetic training data.
8 hours ago
Doesn't "real" distillation use the logits instead of the final tokens? I would classify this more like using a model to generate synthetic training data.
No comments yet
Contribute on Hacker News ↗