Comment by johnfn

13 days ago

The author says that he runs both the reference implementation and the new Rust implementation through 2 million (!) randomly generated battles and flags every battle where the results don't line up.

This is the key to the whole thing in my opinion.

If you ask a coding agent to port code from one language to the another and don't have a robust mechanism to test that the results are equivalent you're inevitably going to waste a lot of time and money on junk code that doesn't work.

  • Fuzzing handles the logic verification, but I'd be more worried about the architectural debt of mapping GC patterns to Rust. You often end up with a mess of Arc/Mutex wrappers and cloning just to satisfy the borrow checker, which defeats the purpose of the port.

    • That will vary depending on how the code is architected to begin with, and the problem domain. Single-ownership patterns can be refactored into Rust ownership, and a good AI model might be able to spot them even when not explicitly marked in the code.

      For some problems dealing with complex general graphs, you may even find it best to use a Rust-based general GC solution, especially if it can be based on fast concurrent GC.

Yeah and he claims a pass rate of 99.96%. At that point you might be running into bugs in the original implementation.

  • Not really. Due to combinatorial explosion some path is hard to hit randomly in this kind of source code. I would have preferred if after 2M random battles the reference implementation had 99% code coverage, than 99% pass rate.

    I don't know anything about Pokemon, but I briefly looked at the code. "weather" seemed like a self contained thing I could potentially understand. Looking at https://github.com/vjeux/pokemon-showdown-rs/blob/master/src...

    > NOTE: ignoringAbility() and abilityState.ending not fully implemented

    So it is almost certain even after 99.96% pass rate, it didn't hit battle with weather suppressing Pokemon but with ability ignored. Code coverage driven testing loop would have found and fixed this one easily.