← Back to context

Comment by margalabargala

11 hours ago

I suspect they're training on this.

I asked Opus 4.6 for a pelican riding a recumbent bicycle and got this.

https://i.imgur.com/UvlEBs8.png

It would be way way better if they were benchmaxxing this. The pelican in the image (both images) has arms. Pelicans don't have arms, and a pelican riding a bike would use it's wings.

  • Having briefly worked in the 3D Graphics industry, I don't even remotely trust benchmarks anymore. The minute someone's benchmark performance becomes a part of the public's purchasing decision, companies will pull out every trick in the book--clean or dirty--to benchmaxx their product. Sometimes at the expense of actual real-world performance.

Interesting that it seems better. Maybe something about adding a highly specific yet unusual qualifier focusing attention?