Comment by simonw

20 hours ago

I've been trying out the new model like this:

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run https://tools.simonwillison.net/python/openai_image.py \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...

Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

71 comments

simonw

simonw 19 hours ago

I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160

https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!

I think that image cost 40 cents.

makira 19 hours ago
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
- Raccoon center: roughly (460, 1680) - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585) - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780 It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
Which is correct!
- cwillu 19 hours ago
  
  I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
  
  4 replies →
- M3L0NM4N 16 hours ago
  
  We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
  
  1 reply →
prmoustache 8 hours ago

Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.
wewtyflakes 16 hours ago
A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!
- rattlesnakedave 13 hours ago
  
  To be fair, the average person has fewer than two arms.
  
  2 replies →
- ehnto 7 hours ago
  
  There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.
- cozzyd 13 hours ago
  
  This is why they're congregating around the first aid and the lost and found
- globular-toast 9 hours ago
  
  Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?
davebren 19 hours ago
The faces...that's nice that it turned a kid's book into an abomination
- Filligree 14 hours ago
  
  By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.
  
  2 replies →
- vaulstein 11 hours ago
  
  It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.
- keithnz 12 hours ago
  
  it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.
jdironman 12 hours ago

The real NFTs where the images we generated along the way
louiereederson 19 hours ago
The people in this image remind me of early this person does not exist, in the best way
- dfee 16 hours ago
  
  fair point, also "this raccoon does not exist"
gpt5 16 hours ago
I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.
https://postimg.cc/wyxgCgNY
- luxpir 9 hours ago
  
  Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)
- djmips 10 hours ago
  
  mmmm yummy OSLS?
mirekrusin 14 hours ago

Can it generate non halloween version though?
This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.
ireadmevs 19 hours ago

I found it on the 2nd image! On the 1st one not yet...
dzhiurgis 11 hours ago
Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg
And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents
p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!
- botanrice 40 minutes ago
  
  Some pretty funny but good examples:
  https://elsrc.com/elsrc/waldo/10_schoolsofthought.jpg
  https://elsrc.com/elsrc/waldo/10_anthropomorphizedcomputermo...
  https://elsrc.com/elsrc/waldo/10_breathoffreshairsittingonad...
  https://elsrc.com/elsrc/waldo/10_drizzydrakesdoingthedrakeme...
  https://elsrc.com/elsrc/waldo/10_sashringingtrashsingingmash...
  Ok i promise I'm done xD
- wordpad 13 minutes ago
  
  That's way more than 10, around 50
- botanrice 1 hour ago
  
  are you using the same prompt the above commenter used? I've been toying around with increasingly ridiculous prompts and it works surprisingly well. It's the new ChatGPT image gen or Nano Banana?
  It's pretty good tbh, even with absurd prompts
Barbing 11 hours ago
>I think that image cost 40 cents.
Kinda made me sad assuming the author didn't license anything to OpenAI.
I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.
$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)
- rafram 11 hours ago
  
  License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.
  
  1 reply →

makira 20 hours ago

> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure

I see an opportunity for a new AI test!

vunderba 18 hours ago

There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
simonw 20 hours ago
I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.

halamadrid 11 hours ago

Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.

vova_hn2 14 hours ago

Thanks for the image, I will see their faces in my nightmares.

vunderba 14 hours ago

This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.
hackable_sand 11 hours ago

What about the faces of the people ChatGPT killed?

marricks 15 hours ago

Like... this has things that AI will seemingly always be terrible at?

At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:

- Nightmarish screaming faces on most people

- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist

- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...

It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...

We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??

p1esk 14 hours ago

AI will seemingly always be ...
You do realize that the whole image generation field is barely 10 years old?
I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!

pants2 19 hours ago

The second 4K image definitely has a raccoon on the left there! Nice.

nerdsniper 14 hours ago

That is a devilishly difficult prompt for current diffusion tasks. Kudos.

ritzaco 19 hours ago

haha took me a while to notice that one of the buildings is labelled 'Ham radio'

arealaccount 19 hours ago

I see the raccoon

ElFitz 19 hours ago

Damn. There’s a fun game app to make here ^^

dymk 15 hours ago
Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.
- ElFitz 8 hours ago
  
  Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.
  Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.
  Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.
- amelius 7 hours ago
  
  Yes sounds more like a fun research project instead.

tptacek 19 hours ago

5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."

(I don't think it's right).

ritzaco 19 hours ago
I tried
> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
and got this. I'm not sure I know what a ham radio looks like though.
https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
- jackpirate 19 hours ago
  
  Also, the racoon it circled isn't in the original.
  
  6 replies →
- simonw 18 hours ago
  
  That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...
- davecahill 13 hours ago
  
  hilarious - i tried and got the same thing.
  there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.