Comment by janalsncm

7 months ago

The bitter-er lesson is that distillation from bigger models works pretty damn well. It’s great news for the GPU poor, not great for the guys training the models we distill from.

2 comments

janalsncm

rfv6723 7 months ago

Distillation is great for researchers and hobbyists.

But nearly all frontier models have anti-distillation ToS, so distillation is out of question for western commercial companies like Apple.

janalsncm 7 months ago

Even if Apple needs to train an LLM from scratch, they can distill it and deploy on edge devices. From that point, inference is free to them.