Comment by simonw
21 hours ago
I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!
I think that image cost 40 cents.
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
Which is correct!
I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
simonw posted 2 different images: make sure to look at the second one.
3 replies →
We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.
Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.
A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!
To be fair, the average person has fewer than two arms.
Most people have an ARM in their pockets, nowadays. And possibly on their wrist.
Haha. Underrated comment!
There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.
This is why they're congregating around the first aid and the lost and found
Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?
The faces...that's nice that it turned a kid's book into an abomination
By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.
But it's also straight up plagiarism and still ridiculously bad on so many levels.
It could already copy the art styles from its training data, what is the advancement here?
It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.
it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.
The real NFTs where the images we generated along the way
The people in this image remind me of early this person does not exist, in the best way
fair point, also "this raccoon does not exist"
I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.
https://postimg.cc/wyxgCgNY
Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)
mmmm yummy OSLS?
Can it generate non halloween version though?
This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.
I found it on the 2nd image! On the 1st one not yet...
Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg
And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents
p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!
Some pretty funny but good examples:
https://elsrc.com/elsrc/waldo/10_schoolsofthought.jpg
https://elsrc.com/elsrc/waldo/10_anthropomorphizedcomputermo...
https://elsrc.com/elsrc/waldo/10_breathoffreshairsittingonad...
https://elsrc.com/elsrc/waldo/10_drizzydrakesdoingthedrakeme...
https://elsrc.com/elsrc/waldo/10_sashringingtrashsingingmash...
Ok i promise I'm done xD
That's way more than 10, around 50
are you using the same prompt the above commenter used? I've been toying around with increasingly ridiculous prompts and it works surprisingly well. It's the new ChatGPT image gen or Nano Banana?
It's pretty good tbh, even with absurd prompts
>I think that image cost 40 cents.
Kinda made me sad assuming the author didn't license anything to OpenAI.
I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.
$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)
License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.
Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.
I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.