Comment by ahoog42

7 hours ago

at what point do model providers optimize for the "pelican riding a bicycle" test so they place well on Simon's influential benchmark? :-)

4 comments

ahoog42

They almost certainly are, even if unknowingly, because HN and all blogs get piped continuously into all models' training corpus.

mudkipdev 3 hours ago
Why is the assumption that they trained for a pelican on a bicycle, rather than running RL for all kinds of 'generate an SVG' tasks?
- simonw 1 hour ago
  
  Gemini did exactly that, and boasted about it at launch: https://x.com/JeffDean/status/2024525132266688757