Digital Red Queen: Adversarial Program Evolution in Core War with LLMs

1 month ago (sakana.ai)

18 comments

hardmaru

Using evolution in the context of Core War is not a new idea by far, it is even referenced in the paper.

Examples here: https://corewar.co.uk/evolving.htm

The difference here is that instead of using a typical genetic algorithm written in a programming language, it uses LLM prompts to do the same thing.

I wonder if the authors tried some of the existing "evolvers" to compare to what the LLM gave out.

dgacmu 1 month ago
Oh man, that's funny to see one of my grad school class projects in that list. Takes me back. :-)
From that experience: The LLM is likely to do drastically better. Most of the prior work, mine included, took a genetic algorithm approach, but an LLM is more likely to make coherent multi-instruction modifications.
It's a shame they didn't compare against some of the standard core wars benchmarks as a way to facilitate comparisons to prior work, though. Makes it hard to say that they're better for sure. https://corewar.co.uk/bench.htm
- throw_paper 1 month ago
  
  For anybody who stumbles over this thread and is curious:
  Ring Warrior Enhanced v9 has a Wilkies score of 34, and
  Spiral Bomber Optimized v22 has a Wilkies score of 85.
  At least that's what my quick and dirty check with exMars says :-)
  34 is not that great. 85 is better, but I think some Core War evolvers can match it. For instance, the MEVO example at https://newton.freehostia.com/net/corewar/evol/ describes an evolved warrior with a score of 93.
- jacquesm 1 month ago
  
  I'm not sure if that will hold up. The LLM is not going to do anything random and that is actually a powerful component that makes original output possible.
  
  2 replies →
Ieghaehia9 1 month ago
That in turn makes me wonder:
Given fixed opposition, finding a warrior that performs the best is an optimization problem. Maybe, for very small core sizes like a nano core, it would be possible to find the optimum directly by SAT or SMT instead of using evolution? Or would it be impractical even for those core sizes?
- slickytail 1 month ago
  
  I think it would, for all practical purposes, be impossible to determine an optimal warrior, even at very small core sizes. Not only is the search space huge but the evaluation function can take unbounded time to resolve. We should consider the halting problem embedded inside the optimization target as a clue to the problem's difficulty.
  
  1 reply →
api 1 month ago

See also:
https://en.wikipedia.org/wiki/Tierra_(computer_simulation)
https://avida-ed.msu.edu
https://github.com/adamierymenko/nanopond
Lots of evolving bug corewar-style systems around.
I think the interesting thing with this one is they're having LLMs create evolving agents instead of blind evolution or some similar ML system.

JKCalhoun 1 month ago

What a lovely period of time that was—when "Computer Recreations" ran monthly in Scientific American. I read the column every month and was fascinated to learn about Eliza, Core Wars, Conway's Life, Wa-Tor, etc. It was a time when you coded simply for the fun of it—to explore, learn.

I know you can still do that today, but… something has changed. I don't know what it is. (Maybe I changed.)

Anyway, I was unable to track down PDF versions of the original articles, but, for the curious and newcomers to Core Wars, they're transcribed here:

https://corewar.co.uk/dewdney/

idiotsecant 1 month ago

Computers are no longer something fresh and new. They are firmly in the realm of stuff that exists and has Rules. The frontier is dead.

hardmaru 1 month ago

Hi HN,

I am one of the authors from Sakana AI and MIT. We just released this paper where we hooked up LLMs to the classic 1984 programming game Core War. For those who haven't played it, Core War involves writing assembly programs in a language called Redcode that battle for control of a virtual computer's memory. You win by crashing the opponent's process while keeping yours running. It is a Turing-complete environment where code and data share the same address space, which leads to some very chaotic self-modifying code dynamics.

We did not just ask the model to write winning code from scratch. Instead, we treated the LLM as a mutation operator within a quality-diversity algorithm called MAP-Elites. The system runs an adversarial evolutionary loop where new warriors are continually evolved to defeat the champions of all previous rounds. We call this Digital Red Queen because it mimics the biological hypothesis that species must continually adapt just to survive against changing competitors.

The most interesting result for us was observing convergent evolution. We ran independent experiments starting from completely different random seeds, yet the populations consistently gravitated toward similar behavioral phenotypes, specifically regarding memory coverage and thread spawning. It mirrors how biological species independently evolve similar traits like eyes to solve similar problems. We also found that this training loop produced generalist warriors that were robust even against human-written strategies they had never encountered during training.

We think Core War is an under-utilized sandbox for studying these kinds of adversarial dynamics. It lets us simulate how automated systems might eventually compete for computational resources in the real world, but in a totally isolated environment. The simulation code and the prompts we used are open source on GitHub.