Comment by looobay

3 days ago

There was research on LLMs training and distillation that if two models have a similar architecture (probably the case for Xai) the "master" model will distill knowledge to the model even if its not in the distillation data. So they probably need to train a new model from scratch.

(sorry i don't remember the name but there was an example with a model liking howl to showcase this)