Comment by a1j9o94 3 days ago You would only use the base model during training. This is a distillation technique 0 comments a1j9o94 Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗