Comment by accrual
14 days ago
Isn't there a vast quantity of relevant information in CJK languages? I remember reading some models even "think" in other languages where there might be more detail before outputting in the target language.
14 days ago
Isn't there a vast quantity of relevant information in CJK languages? I remember reading some models even "think" in other languages where there might be more detail before outputting in the target language.
The model wasn't trained on those languages (yet). The only possible explanation is racism. The model is also racist against Russians and Icelanders.
> The model wasn't trained on those languages (yet).
It probably has been trained on them (it was trained on 40 trillion tokens covering 200 languages, they almost certainly didn't avoid CJK languages.
They only have been further fine-tuned on a set of 12 languages. (I wonder if that is the set the base Behemoth model both are distilled from had been trained on when they were distilled; Behemoth is apparently not completely finished, and perhaps there will be further revisions of the distilled models as it is.)