← Back to context

Comment by embedding-shape

3 days ago

But are you seriously under the belief that all of that, plus all the other things you're forgetting about, is easier, cheaper and faster than transcriptions and translations?

I understand and agree building the LLMs yourself comes with more benefits, long-term ones especially, but still it's harder, more expensive and really time consuming work.

I do not know which is easier. I am not sure that is even well established in research for generative text tasks whether a translation-first or native-language-first is the most sample efficient?

But for a national lab I think it is money well spent to figure out the possibilities and limitations of a native-language LLMs for languages with order of 5M-10M speakers.