Comment by viccis

1 year ago

I really wish I still had the link, but there used to be a website that listed a bunch of times in which machine learning was used (mostly via reinforcement learning) to teach a computer how to play a video game and it ended up using perverse strategies that no human would do. Like exploiting weird glitches (https://www.youtube.com/watch?v=meE5aaRJ0Zs shows this with Q*bert)

Closest I've found to the old list I used to go to is this: https://heystacks.com/doc/186/specification-gaming-examples-...

38 comments

viccis

matsemann 1 year ago

In my thesis many years ago [0] I used EAs to build bicycle wheels. They were so annoyingly good at exploiting whatever idiosyncrasies in my wheel-simulator. Like, the first iterations of my simulator it managed to evolve wheels that would slowly oscillate due to floating point instability or something, and when applied forces to it would increase and increase until the whole simulator exploded and the recorded forces were all over the place, of course then out-competing any wheel in at least some objective dimension.

After fixing those bugs, I mostly struggled with it taunting me. Like building a wheel with all the spokes going from the hub and straight up to the rim. It of course would break down when rolling, but on the objective of "how much load can it handle on the bike" it again out-competed every other wheel, and thus was at the pareto-front of that objective and kept showing up through all my tests. Hated that guy, heh. I later changed it to test all wheels in at least 4 orientations, it would then still taunt me with wheels like (c) in this figure[1], exploiting that.

[0]: https://news.ycombinator.com/item?id=10410813 [1]: https://imgur.com/a/LsONTGc

y33t 1 year ago

My favorite example was a game of pong with the goal of staying alive as long as possible. One ML algo just paused the game and left it like that.

chefandy 1 year ago
My favorite was the ML learning how to optimally make the lowest-impact landing in a flight simulator— it discovered that it could wrap the impact float value if the impact was high enough so instead of figuring out the optimal landing, it started figuring out the optimal path to the highest-impact crashes.
- hammock 1 year ago
  
  This comment ought to be higher up. Such a perfect summary of what I have struggled to understand, which is the “danger” of AI once we allow it to control things
  And yes you can fix the bug but the bike wheel guy shows you there will always be another bug. We need a paper/proof that invents a process that can put an AI-supported (non human intervention) finite cap or limiter or something on the possible bug surface
  
  13 replies →
- tlb 1 year ago
  
  All these claims are like "programming is impossible because I typed in a program and it had a bug". Yes, everyone's first attempt at a reward function is hackable. So you have to tighten up the reward function to exclude solutions you don't want.
  
  1 reply →
- 1234letshaveatw 1 year ago
  
  Ummm, I'm going to hold off on that FSD subscription for a bit longer...
voidUpdate 1 year ago
Is that Learnfun/Playfun that tom7 made? That one paused just before losing on tetris and left it like that, because any other input would make it lose
- y33t 1 year ago
  
  No I want to say this was ~10 years ago. Happened to a university researcher IIRC.

robertjpayne 1 year ago

Make no mistake most humans will exploit any glitches and bugs they can find for personal advantage in game. It’s just machines can exploit timing bugs better.

Muromec 1 year ago
Some people are able to do frame perfect inputs semi consistently from what I understand. I don’t understand how, as my own performance is around hitting 100ms window once, every other time
- TeMPOraL 1 year ago
  
  Maybe they have better equipment?
  If you're using a typical PC (or $deity forbid, a phone) with a typical consumer OS, there's several sources of variability between your controller and the visual feedback you receive from the game, each of which could randomly introduce delays on the order of milliseconds or more. That "randomly" here is the key phrase - lag itself is not a problem, the variability is.
  
  1 reply →

szvsw 1 year ago

There’s a few very cool examples where someone recently used RL to solve trackmania, and ends up having to add all sorts of constraints/penalties to prevent extremely strange exploits/glitches that are discovered IIRC… been a while since I watched.

https://youtu.be/Dw3BZ6O_8LY?si=VUcJa_hfCxjZhhfR

https://youtu.be/NUl6QikjR04?si=DpZ-iqVdqjzahkwy

hnuser123456 1 year ago

Well, in the case of the latter, there was a vaguely known glitch for driving on the nose that allowed for better speeds than possible on 4 wheels, but it would be completely uncontrollable to a human. He figured out how to break the problem down into steps that the NN could gradually learn piecewise, until he had cars racing around tracks while balancing on their nose.
It turned out to have learned to keep the car spinning on its nose for stability, and timing inputs to upset the spinning balance at the right moment to touch the ground with the tire to shoot off in a desired direction.
I think the overall lesson is that, to make useful machine learning, we must break our problems down into pieces small enough that an algorithm can truly "build up skills" and learn naturally, under the correct guidance.
shagie 1 year ago

One of the games that I stumbled across on Steam recently was "AI Learns to Drive" - https://store.steampowered.com/app/3312030/AI_Learns_To_Driv...
It's a neat toy (not really "useful" nor too much of a "game") for generating interest in how neural nets work.

GranPC 1 year ago

I'm almost 100% sure this is the link you're looking for: https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3Hs...

viccis 1 year ago
Haha that was actually the same one I posted in my comment.
This was some old website. A coworker sent it to me on Hipchat at my previous job about 10 years ago. And finding anything online older than like 5 years is nearly impossible unless you have the exact URL on hand.
- GranPC 1 year ago
  
  Oh sorry! I recognized the description but since I recalled mine being a Google Sheets link, I just went straight into search mode - and yep, it actually took me a bit to find.

porridgeraisin 1 year ago

The recent nvidia AI cuda engineer was also similar: https://news.ycombinator.com/item?id=43113941

elzbardico 1 year ago

For the model, the weird glitches are just another element of the game. As they can't reason, have no theory of world or even any real knowledge of what is doing, the model don't have the prior assumptions a human would have about how the game is supposed to be played.

If you think about it, even using the term "perverse" is a result of us antropomorphizing any object in the universe that does anything we believe is on the realm of things humans do.

rdlw 1 year ago

Not quite what you're describing, but no one has yet linked the classic Tom7 series where he applies deep learning to classic NES games: https://youtu.be/xOCurBYI_gY

Muromec 1 year ago

> using perverse strategies that no human would do

Of course we do use perverse strategies and glitches in adversarial multiplayer all the time.

Case in point chainsaw glitch, tumblebuffs, early hits and perfect blocks in Elden Ring

genewitch 1 year ago

on youtube, codebullet remakes games so that he can try different AI techniques to beat them.