Comment by dragonwriter

6 months ago

> The model wasn't trained on those languages (yet).

It probably has been trained on them (it was trained on 40 trillion tokens covering 200 languages, they almost certainly didn't avoid CJK languages.

They only have been further fine-tuned on a set of 12 languages. (I wonder if that is the set the base Behemoth model both are distilled from had been trained on when they were distilled; Behemoth is apparently not completely finished, and perhaps there will be further revisions of the distilled models as it is.)

0 comments

dragonwriter

No comments yet

Contribute on Hacker News ↗