Comment by johnfn
7 hours ago
The author says that he runs both the reference implementation and the new Rust implementation through 2 million (!) randomly generated battles and flags every battle where the results don't line up.
7 hours ago
The author says that he runs both the reference implementation and the new Rust implementation through 2 million (!) randomly generated battles and flags every battle where the results don't line up.
This is the key to the whole thing in my opinion.
If you ask a coding agent to port code from one language to the another and don't have a robust mechanism to test that the results are equivalent you're inevitably going to waste a lot of time and money on junk code that doesn't work.
Yeah and he claims a pass rate of 99.96%. At that point you might be running into bugs in the original implementation.