Comment by ahoog42
7 hours ago
at what point do model providers optimize for the "pelican riding a bicycle" test so they place well on Simon's influential benchmark? :-)
7 hours ago
at what point do model providers optimize for the "pelican riding a bicycle" test so they place well on Simon's influential benchmark? :-)
They almost certainly are, even if unknowingly, because HN and all blogs get piped continuously into all models' training corpus.
See https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
Why is the assumption that they trained for a pelican on a bicycle, rather than running RL for all kinds of 'generate an SVG' tasks?
Gemini did exactly that, and boasted about it at launch: https://x.com/JeffDean/status/2024525132266688757