Comment by dgacmu
1 month ago
Oh man, that's funny to see one of my grad school class projects in that list. Takes me back. :-)
From that experience: The LLM is likely to do drastically better. Most of the prior work, mine included, took a genetic algorithm approach, but an LLM is more likely to make coherent multi-instruction modifications.
It's a shame they didn't compare against some of the standard core wars benchmarks as a way to facilitate comparisons to prior work, though. Makes it hard to say that they're better for sure. https://corewar.co.uk/bench.htm
For anybody who stumbles over this thread and is curious:
Ring Warrior Enhanced v9 has a Wilkies score of 34, and
Spiral Bomber Optimized v22 has a Wilkies score of 85.
At least that's what my quick and dirty check with exMars says :-)
34 is not that great. 85 is better, but I think some Core War evolvers can match it. For instance, the MEVO example at https://newton.freehostia.com/net/corewar/evol/ describes an evolved warrior with a score of 93.
I'm not sure if that will hold up. The LLM is not going to do anything random and that is actually a powerful component that makes original output possible.
I wonder if a combination would be useful. Use an actual GA to do the mutation, and then let an LLM "fix" each mutated child.
Could be. But the interesting thing is that all you can do here is optimize. Random chance is - like attention ;) - all you need.