Comment by MagicMoonlight

5 days ago

They’re definitely RL training the models on the pelican test. They patch any kind of test that shows them performing poorly by hardcoding some answers into the model.

0 comments