Comment by ben_w

1 day ago

More that "these models work … like humans" (discretely or otherwise) does not imply the quotation.

Most humans do not have perfect drawing skills and perfect knowledge about bikes and birds, they do not output such a simple drawing correctly 100% of the time.

"Average human" is a much lower bar than most people want to believe, mainly because most of us are average on most skills, and also overestimate our own competence — the modal human has just a handful of things they're good at, and one of those is the language they use, another is their day job.

Most of us can't draw, and demonstrably can't remember (or figure out from first principles) how a bike works. But this also applies to "smart" subsets of the population: physicists have https://xkcd.com/793/, and there's this famous rocket scientist who weighed in on rescuing kids from a flooded cave, they come up with some nonsense about a submarine.

It’s not that humans have perfect drawing skills, it’s that humans can judge their performance and get better over time.

Ask 100 random people to draw a bike and in 10 minutes and they’ll on average suck while still beating the LLM’s here. Give em an incentive and 10 months and the average person is going to be able to make at least one quite decent drawing of a bike.

The cost and speed advantage of LLM’s is real as long as you’re fine with extremely low quality. Ask a model for 10,000 drawings so you can pick the best and you get a marginal improvements based on random chance at a steep price.

  • > Ask 100 random people to draw a bike and in 10 minutes and they’ll on average suck while still beating the LLM’s here.

    Y'see, this is a prime example of what I meant with ""Average human" is a much lower bar than most people want to believe, mainly because most of us are average on most skills, and also overestimate our own competence".

    An expert artist can spend 10 minutes and end up with a brief sketch of a bike. You can witness this exact duration yourself (with non-bike examples) because of a challenge a few years back to draw the same picture in 10 minutes, 1 minute, and 10 seconds.

    A normal person spending as much time as they like gets you the pictures that I linked to in the previous post, because they don't really know what a bike is. 45 examples of what normal people think a bike looks like: https://www.gianlucagimini.it/portfolio-item/velocipedia/

    > Give em an incentive and 10 months and the average person is going to be able to make at least one quite decent drawing of a bike.

    Given mandatory art lessons in school are longer than 10 months, and yet those bike examples exist, I have no reason to believe this.

    > Ask a model for 10,000 drawings so you can pick the best and you get a marginal improvements based on random chance at a steep price.

    If you do so as a human, rating and comparing images? Then the cost is your own time.

    If you automate it in literally the manner in this write-up (pairwise comparison via API calls to another model to get ELO ratings), ten thousand images is like $60-$90, which is on the low end for a human commission.

    • As an objective criteria what percentage include peddles and a chain connecting one of the wheels? I quickly found a dozen and stopped counting. Now do the same for those LLM images and it’s clear humans win.

      > ""Average human" is a much lower bar than most people want to believe

      I have some basis for comparison. I’ve seen 6 years olds draw better bikes than those LLM’s.

      Look through that list again the worst example does even have wheels, multiple of them have wheels without being connected to anything.

      Now if you’re arguing the average human is worse than the average 6 year old I’m going to disagree here.

      > Given mandatory art lessons in school are longer than 10 months, and yet those bike examples exist, I have no reason to believe this.

      Art lessons don’t cumulatively spend 10 months teaching people how to draw a bike. I don’t think I cumulatively spent 6 months drawing anything. Painting, collage, sculpture, coloring, etc art covers a lot and wasn’t an every day or even every year thing. My mandatory collage class was art history, we didn’t create any art.

      You may have spent more time in class studying drawing, but that’s not some universal average.

      > If you automate it in literally the manner in this write-up (pairwise comparison via API calls to another model to get ELO ratings), ten thousand images is like $60-$90, which is on the low end for a human commission.

      Not every one of those images had a price tag but one was 88 cents, * 10,000 = 8,800$ just to make the image for a test even at 4c/image your looking at 400$. Cheaper models existed but fairly consistently had worse performance.

      4 replies →

    • > A normal person spending as much time as they like gets you the pictures that I linked to in the previous post, because they don't really know what a bike is. 45 examples of what normal people think a bike looks like: https://www.gianlucagimini.it/portfolio-item/velocipedia/

      A normal person given the ability to consult a picture of a bike while drawing will do much better. An LLM agent can effectively refresh its memory (or attempt to look up information on the Internet) any time it wants.

      1 reply →