It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.
Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).
Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?
It doesn't, I get that it's _a_ benchmark. It's just not a good or insightful one, and having it posted so often on HN feels like low quality spam at this point
The issue is that benchmarks that look insightful will end up being gamed by labs quickly (Goodharts law)
The best LLM benchmarks test around the margins of those behaviors, tasks that are difficult and correlate with usefulness while being removed enough to stay unpolluted
Upvotes are cheap, the fact that something is upvoted doesn't mean it's valuable (see: Reddit). Another thing is how insightful is the discussion under a typical pelican comment are (and how much of it is related to the pelican and how often it's just where the general discussion happens).
Every forum gets regulars and their fan clubs. If you go to /r/comics and look at top for the month you'll see 4 out of 5 are pizzacakecomic. People on these forums sort of form a fanclub around 'their guy'. This forum's guy is this chap. Not much point being upset about it, tbh.
It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.
Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).
Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?
Maybe this can help
https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
It doesn't, I get that it's _a_ benchmark. It's just not a good or insightful one, and having it posted so often on HN feels like low quality spam at this point
The issue is that benchmarks that look insightful will end up being gamed by labs quickly (Goodharts law)
The best LLM benchmarks test around the margins of those behaviors, tasks that are difficult and correlate with usefulness while being removed enough to stay unpolluted
It's a great filter for people who take things far too seriously
It's tradition at this point. Based on the upvotes the comment receives, it looks like many readers find value in it.
Upvotes are cheap, the fact that something is upvoted doesn't mean it's valuable (see: Reddit). Another thing is how insightful is the discussion under a typical pelican comment are (and how much of it is related to the pelican and how often it's just where the general discussion happens).
It means somebody likes it.
[flagged]
> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.
https://news.ycombinator.com/newsguidelines.html
Every forum gets regulars and their fan clubs. If you go to /r/comics and look at top for the month you'll see 4 out of 5 are pizzacakecomic. People on these forums sort of form a fanclub around 'their guy'. This forum's guy is this chap. Not much point being upset about it, tbh.
I, for one, find it entertaining.
[dead]
[flagged]
[flagged]
Well clearly some people care.