Comment by jasonjmcghee

9 hours ago

What's crazy is you've influenced them to spend real effort ensuring their model is good at generating animated svgs of animals operating vehicles.

The most absurd benchmaxxing.

https://x.com/jeffdean/status/2024525132266688757?s=46&t=ZjF...

15 comments

jasonjmcghee

simonw 8 hours ago

I like how they also did a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine.

jasonjmcghee 8 hours ago

Ok Google what are some other examples like a pelican riding a bicycle
simultsop 7 hours ago

reminds me of andor, luthen, positive reinforcing wasting time of emperor

threatofrain 9 hours ago

Animated SVG is huge. People in different professions are worrying to different degrees in terms of being replaced by ML, but this one is huge with regards to digital art.

yieldcrv 7 hours ago

yeah, complex SVG's are so much more bandwidth, computation and energy efficient than raster images - up to a point! but in general use we are not at that point and there's so much more we can do with it
I've been meaning to let coding agents take a stab at using the lottie library https://github.com/airbnb/lottie-web to supercharge the user experience without needing to make it a full time job

eurekin 9 hours ago

Can't wait until they finally get to real world CAD

tngranados 9 hours ago

There's a CAD example in that same thread: https://x.com/JeffDean/status/2024528776856817813
gibspaulding 2 hours ago

I know this isn’t necessarily “real world CAD” but Claude Code is not too shabby at OpenSCAD.

tantalor 9 hours ago

He's svg-mogging

gnatolf 9 hours ago

So let's put things we're interested in in the benchmarks.

I'm not against pelicans!

ghurtado 8 hours ago
I think the reason the pelican example is great is because it's bizarre enough that it's unlikely that to appear in the training as one unified picture.
If we picked something more common, like say, a hot dog with toppings, then the training contamination is much harder to control.
- troymc 5 hours ago
  
  I think it's now part of their training though, thanks to Simon constantly testing every new model against it, and sharing his results publicly.
  There's a specific term for this in education and applied linguistics: the washback effect.
- rvnx 8 hours ago
  
  It's the most common SVG test, it's the equivalent of Will Smith eating spaghettis, so obviously they benchmax toward it

casey2 8 hours ago

You don't have to benchmax everything, just the benchmarks in the right social circles

UltraSane 9 hours ago

It if funny to think that Jeff Dean personally worked to optimize the pelican riding a bike benchmark.