← Back to context

Comment by TeMPOraL

4 months ago

Yeah, and if anything, RL has a rep of being too good at this job, because of all the cases where it gamed a benchmark by picking up on some environmental factor the supervisors hadn't thought of (numerical instabilities, rounding, bugs, etc.).