Comment by godelski
9 hours ago
Yet people often forget this. We don't have mathematical models of truth, beauty, or many abstract things. Thus we proxy it with "I know it when I see it." It's a good proxy for lack of anything better but it also creates a known danger: the model optimizes deception. The proxy helps it optimize the answers we want but if we're not incredibly careful they also optimize deception.
This makes them frustrating and potentially dangerous tools. How do you validate a system optimized to deceive you? It takes a lot of effort! I don't understand why we are so cavalier about this.
No the question is, how do you train the system so it doesn't deceive you?
That is a question of how to train future models. It needs to be answered. Answering this question will provide valuable insight into that one. They are duals