Comment by simonw 1 day ago OK that is recognizably a pelican, pretty great! 8 comments simonw Reply qingcharles 1 day ago This feels like the best pelicanbike yet. The singularity might be closer than we imagine.Time for a leaderboard? lostmsu 1 day ago Ask and you'll receive: https://pelicans.borg.games/ espadrine 1 day ago It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation. qingcharles 1 day ago Nice! Is there a way I can click on the leaderboard items so I can view them? 1 reply → prabhasp 1 day ago Lol, can you add a "both of these are terrible" option? 1 reply → taytus 19 hours ago I think they (LLMs providers) are manually tuning these cases/examples.Pelinkan on a bike - > some dude (from these labs) creates it, and it becomes part of the training data.
qingcharles 1 day ago This feels like the best pelicanbike yet. The singularity might be closer than we imagine.Time for a leaderboard? lostmsu 1 day ago Ask and you'll receive: https://pelicans.borg.games/ espadrine 1 day ago It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation. qingcharles 1 day ago Nice! Is there a way I can click on the leaderboard items so I can view them? 1 reply → prabhasp 1 day ago Lol, can you add a "both of these are terrible" option? 1 reply → taytus 19 hours ago I think they (LLMs providers) are manually tuning these cases/examples.Pelinkan on a bike - > some dude (from these labs) creates it, and it becomes part of the training data.
lostmsu 1 day ago Ask and you'll receive: https://pelicans.borg.games/ espadrine 1 day ago It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation. qingcharles 1 day ago Nice! Is there a way I can click on the leaderboard items so I can view them? 1 reply → prabhasp 1 day ago Lol, can you add a "both of these are terrible" option? 1 reply →
espadrine 1 day ago It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation.
qingcharles 1 day ago Nice! Is there a way I can click on the leaderboard items so I can view them? 1 reply →
taytus 19 hours ago I think they (LLMs providers) are manually tuning these cases/examples.Pelinkan on a bike - > some dude (from these labs) creates it, and it becomes part of the training data.
This feels like the best pelicanbike yet. The singularity might be closer than we imagine.
Time for a leaderboard?
Ask and you'll receive: https://pelicans.borg.games/
It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation.
Nice! Is there a way I can click on the leaderboard items so I can view them?
1 reply →
Lol, can you add a "both of these are terrible" option?
1 reply →
I think they (LLMs providers) are manually tuning these cases/examples.
Pelinkan on a bike - > some dude (from these labs) creates it, and it becomes part of the training data.