← Back to context

Comment by foo12bar

7 hours ago

I've noticed AI's often try and hide failure by catching exceptions and returning some dummy value maybe with some log message buried in tons of extraneous other log messages. And the logs themselves are often over abbreviated and missing key data to successfully debug what is happening.

I suspect AI's learned to do this in order to game the system. Bailing out with an exception is an obvious failure and will be penalized, but hiding a potential issue can sometimes be regarded as a success.

I wonder how this extrapolates to general Q&A. Do models find ways to sound convincing enough to make the user feels satisfied and the go away? I've noticed models often use "it's not X, it's Y", which is a binary choice designed to keep the user away from thinking about other possibilities. Also they often come up with a plan of action at the end of their answer, a sales technique known as the "assumptive close", which tries to get the user to think about the result after agreeing with the AI, rather than the answer itself.

AI behavior is pretty easy to understand and predict if you view it from the lens of: they will shamelessly do any/everything possible to game whatever metric they are trained on. Because... that's how hill-climbing a metric looks. It's A/B enshittification taken to inscrutable heights.

They are trained on human feedback, so there is no other way this goes. Every bit of every response is pointed toward subversion of the assumed evaluator.