Comment by Lerc

4 hours ago

Perhaps, but there is a difference in a reasoning system deciding on the best way to achieve the goal.

To get the predicted disastrous effects you need to be doing function optimisation without regard to the meaning of the function parameters. Yes, models can still game the system at inference time, but in much the same way as a human might game the system, it requires awareness that you are going against the intent of some rule.