← Back to context

Comment by rafram

1 day ago

People on HN complain constantly about "open-source" models not releasing their training data. That's what the second point ("transparent") seems to be alluding to. And that's a bad thing?

Others have responded to your "diversity" point, but making sure to train on adequate amounts of data in all EU languages is valuable, especially because LLMs are so prone to generating convincing BS when working close to the edges of their training set. If this exists, people in Malta are going to want to use it, so better for it to generate good Maltese than gibberish that sort of looks like Maltese, right?