Comment by bodegajed
5 hours ago
it is like reward hacking, where the reward function in this case the test is exploited to achieve its goals. it wants to declare victory and be rewarded so the tests are not critical to the code under test. This is probably in the RL pre-training data, I am of course merely speculating.
No comments yet
Contribute on Hacker News ↗