Comment by aurbano
1 year ago
Is AI the danger, or is our inability to simplify a problem down to an objective function the problem?
If anything, AI could help by "understanding" the real objective, so we don't have to code these simplified goals that ML models end up gaming no?
Simplification is the problem here, arguably. Even a simple-sounding objective (say, a bicycle wheel that holds load the best) has at least one implicit assumption - it will be handled and used in the real world. Which means it'll be subject of sloppy handling and thermal spikes and weather and abuse and all kinds of things that are not just meeting the goal. Any of those cheesy AI designs, if you were to 3D-print/replicate them, they'd fall apart as you picked them up. So the problem seems to be, ML algorithm is getting too simple goal function - one lacking the "used in the real world" part.
I feel that a good first step would be to introduce some kind of random jitter into the simulation. Like, in case of the wheels, introduce road bumps, and perhaps start each run by simulating dropping the wheel from a short distance. This should quickly weed out "too clever" solutions - as long as the jitter is random enough, so RL won't pick up on it and start to exploit its non-randomness.
Speaking of road bumps: there is no such thing in reality as a perfectly flat road; if the wheel simulator is just rolling wheels on mathematically perfect roads, that's a big deviation from reality - precisely the kind that allows for "hacky" solutions that are not possible in the real world.
You would have to introduce jitter to every possible dimension, when the dimensions themselves are continually expanding (as illuminated by the bike wheel example).. the combination of jitter x dimensions leads to an undefined problem (AKA theory of everything) in exponential fashion
Humans don't simplify problems by reducing them to objective functions: we simplify them by reducing them to specific instances of abstract concepts. Human thought is fundamentally different to the alien processes of naïve optimising agents.
We do understand the "real objectives", and our inability to communicate this understanding to hill-climbing algorithms is a sign of the depth of our understanding. There's no reason to believe that anything we yet call "AI" is capable of translating our understanding into a form that, magically, makes the hill-climbing algorithm output the correct answer.
How would more AI help? "given this goal with these parameters, figure out if another AI will ever game it into eventual thermonuclear war. "
Feels halting problem-esque.
My point was that instead of blaming ML - or optimisation tools really - for gaming objective functions and coming up with non-solutions that do maximise reward, AI could instead be used to measure the reward/fitness of the solution.
So to the OP's example "optimise a bike wheel", technically an AI should be able to understand whether a proposed wheel is good or not, in a similar way to a human.
>simplify a problem down to an objective function
Yes, I have an intuition that this is NP hard though