Comment by wtallis

1 day ago

I don't think we have good tools for formally proving that a transformer's output will match a more traditionally-defined function. But the leading transformers are small enough that formal verification may be possible.

Without any formal verification: The input space of two 10-digit numbers is a bit bigger than 64-bits, so exhaustively verifying all possible inputs doesn't sound practical. Using the same subset of the input space for verifying each submission seems like the easiest way to be fair, and not disclosing that subset to the competitors is obviously necessary.