Comment by moffkalast
2 years ago
That doesn't exactly mean that those representations are in any way correct. I may be anthropomorphizing too much here, but it feels exactly like asking someone who's done nothing but rote learning and seeing them try to apply probable reasons to things they fundamentally do not understand. The instant assumption that if the asker talks about something then it must be true.
Seems unlikely given the model does pretty well on novel situations. If you ask someone to apply reasons to things they do not understand, you would expect them to get it wrong pretty consistently.
I don't know that it's significant to say that a model's representation of things tends to be good enough to generalize across things but isn't perfect under the hood. That applies to humans too.
The claim was that these models aren't making representations of the underlying reason for things. I guess I'm indifferent if you agree that they are, but that some of those reasons are not correct.