Comment by Retric

1 day ago

As an objective criteria what percentage include peddles and a chain connecting one of the wheels? I quickly found a dozen and stopped counting. Now do the same for those LLM images and it’s clear humans win.

> ""Average human" is a much lower bar than most people want to believe

I have some basis for comparison. I’ve seen 6 years olds draw better bikes than those LLM’s.

Look through that list again the worst example does even have wheels, multiple of them have wheels without being connected to anything.

Now if you’re arguing the average human is worse than the average 6 year old I’m going to disagree here.

> Given mandatory art lessons in school are longer than 10 months, and yet those bike examples exist, I have no reason to believe this.

Art lessons don’t cumulatively spend 10 months teaching people how to draw a bike. I don’t think I cumulatively spent 6 months drawing anything. Painting, collage, sculpture, coloring, etc art covers a lot and wasn’t an every day or even every year thing. My mandatory collage class was art history, we didn’t create any art.

You may have spent more time in class studying drawing, but that’s not some universal average.

> If you automate it in literally the manner in this write-up (pairwise comparison via API calls to another model to get ELO ratings), ten thousand images is like $60-$90, which is on the low end for a human commission.

Not every one of those images had a price tag but one was 88 cents, * 10,000 = 8,800$ just to make the image for a test even at 4c/image your looking at 400$. Cheaper models existed but fairly consistently had worse performance.

4 comments

Retric

simonw 1 day ago

The 88 cent one was the most expensive almost my an order of magnitude. Most of these cost less than a cent to generate - that's why I highlighted the price on the o1 pro output.

Retric 1 day ago
Yes, but if you’re averaging cheap and expensive options the expensive ones make a significant difference. Cheaper is bound by 0 so it can’t differ as much from the average.
Also, when you’re talking about how cheap something is, including the price makes sense. I had no idea on many of those models.
- simonw 1 day ago
  
  If you're interested, you can get cost estimates from my pricing calculator site here: https://www.llm-prices.com/#it=11&ot=1200
  That link seeds it with 11 input tokens and 1200 output tokens - 11 input tokens is what most models use for "Generate an SVG of a pelican riding a bicycle" and 1200 is the number of output tokens used for some of the larger outputs.
  Click on different models to see estimated prices. They range from 0.0168 cents for Amazon Nova Micro (that's less than 2/100ths of a cent) up to 72 cents for o1-pro.
  The most expensive model most people would consider is Claude 4 Opus, at 9 cents.
  GPT-4o is the upper end of the most common prices, at 1.2 cents.
  
  1 reply →