Comment by neepi

1 day ago

My only take home is they are all terrible and I should hire a professional.

29 comments

neepi

Before that, you might ask ChatGPT to create a vector image of a pelican riding a bicycle and then running the output through a PNG to SVG converter...

Result: https://www.dropbox.com/scl/fi/8b03yu5v58w0o5he1zayh/pelican...

These are tough benchmarks to trial reasoning by having it _write_ an SVG file by hand and understanding how it's to be written to achieve this. Even a professional would struggle with that! It's _not_ a benchmark to give an AI the best tools to actually do this.

YuccaGloriosa 20 hours ago
I think you made an error there png is a bitmap format
- sethaurus 19 hours ago
  
  You've misunderstood. The parent was making a specific point — if you want an SVG of a penguin, the easiest way to AI-generate it is to get an image generator to create a (vector-styled) bitmap, then auto-vectorize it to SVG. But the point of this benchmark is that it's asking models to create an SVG the hard way, by writing its code directly.

keiferski 1 day ago

As the other guy said, these are text models. If you want to make images use something like Midjourney.

Promoting a pelican riding a bicycle makes a decent image there.

keiferski 1 day ago

* Prompting

vunderba 17 hours ago

This test isn't really about the quality of the image itself (multimodals like gpt-image-1 or even standard diffusion models would be far superior) - it's about following a spec that describes how to draw.

A similar test would be if you asked for the pelican on a bicycle through a series of LOGO instructions.

spaceman_2020 1 day ago

My only take home is that a spanner can work as a hammer, but you probably should just get a hammer

GaggiX 1 day ago

An expert at writing SVGs?

dist-epoch 1 day ago

Most of them are text-only models. Like asking a person born blind to draw a pelican, based on what they heard it looks like.

neepi 1 day ago
That seems to be a completely inappropriate use case?
I would not hire a blind artist or a deaf musician.
- simonw 1 day ago
  
  Yeah, that's part of the point of this. Getting a state of the art text generating LLM to generate SVG illustrations is an inappropriate application of them.
  It's a fun way to deflate the hype. Sure, your new LLM may have cost XX million to train and beat all the others on the benchmarks, but when you ask it to draw a pelican on a bicycle it still outputs total junk.
  
  1 reply →
- __alexs 1 day ago
  
  I guess the idea is that by asking the model to do something that is inherently hard for it we might learn something about the baseline smartness of each model which could be considered a predictor for performance at other tasks too.
- dmd 1 day ago
  
  Sorry, Beethoven, you just don’t seem to be a match for our org. Best of luck on your search!
  You too, Monet. Scram.
- namibj 1 day ago
  
  It's a proxy for abstract designing, like writing software or designing in a parametric CAD.
  Most the non-math design work of applied engineering AFAIK falls under the umbrella that's tested with the pelican riding the bicycle. You have to make a mental model and then turn it into applicable instructions.
  Program code/SVG markup/parametric CAD instructions don't really differ in that aspect.
  
  5 replies →
- dist-epoch 1 day ago
  
  The point is about exploring the capabilities of the model.
  Like asking you to draw a 2D projection of 4D sphere intersected with a 4D torus or something.
  
  2 replies →
- wongogue 1 day ago
  
  Even Beethoven?

matkoniecz 1 day ago

it depends on quality you need and your budget

neepi 1 day ago
Ah yes the race to the bottom argument.
- ben_w 1 day ago
  
  When I was at university, they got some people from industry to talk to us all about our CVs and how to do interviews.
  My CV had a stupid cliché, "committed to quality", which they correctly picked up on — "What do you mean?" one of them asked me, directly.
  I thought this meant I was focussed on being the best. He didn't like this answer.
  His example, blurred by 20 years of my imperfect human memory, was to ask me which is better: a Porsche, or a go-kart. Now, obviously (or I wouldn't be saying this), Porsche was a trick answer. Less obviously is that both were trick answers, because their point was that the question was under-specified — quality is the match between the product and what the user actually wants, so if the user is a 10 year old who physically isn't big enough to sit in a real car's driver's seat and just wants to rush down a hill or along a track, none of "quality" stuff that makes a Porsche a Porsche is of any relevance at all, but what does matter is the stuff that makes a go-kart into a go-kart… one of which is the affordability.
  LLMs are go-karts of the mind. Sometimes that's all you need.
  
  2 replies →