Comment by AnimalMuppet
15 days ago
All right, let's say that the baseline is "what is true". Then bias is departure from the truth.
That sounds great, right up until you try to do something with it. You want your LLM to be unbiased? So you're only going to train it on the truth? Where are you going to find that truth? Oh, humans are going to determine it? Well, first, where are you going to find unbiased humans? And, second, they're going to curate all the training data? How many centuries will that take? We're trying to train it in a few months.
And then you get to things like politics and sociology. What is the truth in politics? Yeah, I know, a bunch of politicians say things that are definitely lies. But did Obamacare go too far, or not far enough, or was it just right? There is no "true" answer to that. And yet, discussions about Obamacare may be more or less biased. How are you going to determine what that bias is when there isn't a specific thing you can point to and say, "That is true"?
So instead, they just train LLMs on a large chunk of the internet. Well, that includes things like the fine-sounding-but-completely-bogus arguments of flat earthers. In that environment, "bias" is "departure from average or median". That is the most it can mean. So truth is determined by majority vote of websites. That's not a very good epistemology.
The definition of the word has no responsibility to your opinion of it as an epistemology.
Also, you're just complaining about the difficulty of determining what is true. That's a separate problem, isn't it?
If we had an authoritative way of determining truth, then we wouldn't have the problem of curating material to train an LLM on. So no, I don't think it's a separate problem.
Again, the word "bias" and its definition exists outside the comparatively narrow concern of training LLMs.
2 replies →