Comment by MrManatee
9 hours ago
Prompts like this feel like it's using the wrong abstraction. The "obvious" thing to do with something like this would be to generate some code that generates the image and then run that code.
Inspired by this, I tried something much simpler. I asked it to draw 12 concentric circles. With three tries it always drew 10 instead. https://chatgpt.com/share/69e87d08-5a14-83eb-9a3b-3a8eb14692...
I think prompts like this are where agentic workflows come in to play. If you asked it to do generate the first 64 prime numbers, AI tools could do that. If you asked it to draw a charcoal image of Pokemon 13, it could do that. If you asked it to add a white Menlo 13 on a black background to the top left corner of that image, it could do that. If you asked it to do that 63 more times, it could do those things, and if you asked it to assemble those into a grid, it could.
It can't get that in a one-shot. Perhaps, though, it could figure out when it needs to break a problem into individual tasks to delegate to itself and assemble them at the end.
That's what makes it a fair evaluation of its limits
I mean asking these transformers to do maths has always been the wrong task. It's like we're now considering "it doesn't have x tools built with traditional code built in".
Though I suppose we're testing their model + agent harness here as well. It really _should_ have all of those tools/reasoning available to accomplish a task like the above without issue.