← Back to context

Comment by furyofantares

2 years ago

This is what the reply was:

> Oh, it it's squeaking then it's definitely going to float.

> It is a rubber duck.

> It is made of a material that is less dense than water.

Full points for saying if it's squeaking then it's going to float.

Full points for saying it's a rubber duck, with the implication that rubber ducks float.

Even with all that context though, I don't see how "it is made of a material that is less dense than water" scores any points at all.

Yeah, I think arguing the logic behind these responses misses the point, since an LLM doesn't use any kind of logic--it just responds in a pattern that mimics the way people respond. It says "it is made of a material that is less dense than water" because that is a thing that is similar to what the samples in its training corpus have said. It has no way to judge whether it is correct, or even what the concept of "correct" is.

When we're grading the "correctness" of these answers, we're really just judging the average correctness of Google's training data.

Maybe the next step in making LLM's more "correct" is not to give them more training data, but to find a way to remove the bad training data from the set?