Comment by Philpax
21 hours ago
It wouldn't be data distillation: instead, it would be teacher-student distillation. The teacher model has stronger representations that the student can mimic, which would give it more capability over training on the data itself.
No comments yet
Contribute on Hacker News ↗