Comment by int_19h

3 months ago

In a similar way to how this works for natural languages. Turns out that if you train the model on e.g. vast quantities of English, teaching it other languages doesn't require nearly as much, because it has already internalized all the "shared" parts (and there's a lot more of those than there are surface differences).

But, yes, it does mean that new things that are drastic breaks with old practices are much harder to teach compared to incremental improvements.