Comment by dingdingdang
5 days ago
Indeed. I've thought from the beginning that LLMs should focus specifically on ONE language for this exact reason (i.e. mediocre/bad duplication of data in multiple languages). All other languages than English essentially "syphon" off capacity/layers/weights that could otherwise have held more genuine data/knowledge. Other languages should not come into the picture afaics - dedicated translation LLMs/existing-solutions can handle this aspect just fine and there's just no salient reason to fold partial-multi-language-capacity in through fuzzy/unorganised training.
No comments yet
Contribute on Hacker News ↗