Comment by TeMPOraL

1 month ago

Yeah, and if anything, RL has a rep of being too good at this job, because of all the cases where it gamed a benchmark by picking up on some environmental factor the supervisors hadn't thought of (numerical instabilities, rounding, bugs, etc.).