Comment by baltimore

3 months ago

Since the first (good) image generation models became available, I've been trying to get them to generate an image of a clock with 13 instead of the usual 12 hour divisions. I have not been successful. Usually they will just replace the "12" with a "13" and/or mess up the clock face in some other way.

I'd be interested if anyone else is successful. Share how you did it!

75 comments

baltimore

Scene_Cast2 3 months ago

I've noticed that image models are particularly bad at modifying popular concepts in novel ways (way worse "generalization" than what I observe in language models).

emp17344 3 months ago
Maybe LLMs always fail to generalize outside their data set, and it’s just less noticeable with written language.
- cluckindan 3 months ago
  
  This is it. They’re language models which predict next tokens probabilistically and a sampler picks one according to the desired ”temperature”. Any generalization outside their data set is an artifact of random sampling: happenstance and circumstance, not genuine substance.
  
  6 replies →
- phire 3 months ago
  
  Most image models are diffusion models, not LLMs, and have a bunch of other idiosyncrasies.
  So I suspect it's more that lessons from diffusion image models don't carry over to text LLMs.
  And the Image models which are based on multi-mode LLMs (like Nano Banana) seem to do a lot better at novel concepts.
  
  2 replies →
- IshKebab 3 months ago
  
  They definitely don't completely fail to generalise. You can easily prove that by asking them something completely novel.
  Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.
  Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?
  Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.
  
  1 reply →
CobrastanJorji 3 months ago

Also, they're fundamentally bad at math. They can draw a clock because they've seen clocks, but going further requires some calculations they can't do.
For example, try asking Nano Banana to do something simpler, like "draw a picture of 13 circles." It likely will not work.

deathanatos 3 months ago

  Generate an image of a clock face, but instead of the usual 12 hour numbering, number it with 13 hours.

Gemini, 2.5 Flash or "Nano Banana" or whatever we're calling it these days. https://imgur.com/a/1sSeFX7

A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).

bar000n 3 months ago
It should be pretty clear already that anything which is based (limited?) to communicating words/text can never grasp conceptual thinking.
We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.
- bayindirh 3 months ago
  
  > We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.
  We have a very comprehensive and precise spec for that [0].
  If you don't want to hop through the certificate warning, here's the transcript:
  - Some day, we won't even need coders any more. We'll be able to just write the specification and the program will write itself.
  - Oh wow, you're right! We'll be able to write a comprehensive and precise spec and bam, we won't need programmers any more.
  - Exactly
  - And do you know the industry term for a project specification that is comprehensive and precise enough to generate a program?
  - Uh... no...
  - Code, it's called code.
  [0]: https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...
  
  1 reply →
- Uehreka 3 months ago
  
  I don’t think that’s clear at all. In fact the proficiency of LLMs at a wide variety of tasks would seem to indicate that language is a highly efficient encoding of human thought, much moreso than people used to think.
  
  1 reply →
- XenophileJKO 3 months ago
  
  I mean, that's not really "true".
  https://claude.ai/public/artifacts/0f1b67b7-020c-46e9-9536-c...
- rideontime 3 months ago
  
  Really? I can grasp the concept behind that command just fine.

andix 3 months ago

I gave this "riddle" to various models:

> The farmer and the goat are going to the river. They look into the sky and see three clouds shaped like: a wolf, a cabbage and a boat that can carry the farmer and one item. How can they safely cross the river?

Most of them are just giving the result to the well known river crossing riddle. Some "feel" that something is off, but still have a hard time to figure out that wolf, boat and cabbage are just clouds.

jampa 3 months ago
There are few examples of this as well:
https://www.reddit.com/r/singularity/comments/1fqjaxy/contex...
- andix 3 months ago
  
  It really shows how LLMs work. It's all about probabilities, and not about understanding. If something looks very similar to a well known problem, the llm is having a hard time to "see" contradictions. Even if it's really easy to notice for humans.
Recursing 3 months ago
Claude has no problem with this: https://imgur.com/a/ifSNOVU
Maybe older models?
- andix 3 months ago
  
  Try to twist around words and phrases, at some point it might start to fail.
  I tried it again yesterday with GPT. GPT-5 manages quite well too in thinking mode, but starts crackling in instant mode. 4o completely failed.
  It's not that LLMs are unable to solve things like that at all, but it's really easy to find some variations that make them struggle really hard.
userbinator 3 months ago

Basically a variation of https://en.wikipedia.org/wiki/Age_of_the_captain

echelon 3 months ago

That's just a patch to the training data.

Once companies see this starting to show up in the evals and criticisms, they'll go out of their way to fix it.

rideontime 3 months ago

What would the "patch" be? Manually create some images of 13-hour clocks and add them to the training data? How does that solution scale?
godelski 3 months ago

s/13/17/g ;)

BrandoElFollito 3 months ago

This is really cool. I tried to prompt gemini but every time I got the same picture. I do not know how to share a session (like it is possible with Chatgpt) but the prompts were

If a clock had 13 hours, what would be the angle between two of these 13 hours?

Generate an image of such a clock

No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above

This is the same image. There need to be 13 hour marks around the dial, evenly spaced

... And its last answer was

You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.

Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.

And the very same clock, with 12 hours, and a 13th above the 12...

ryandrake 3 months ago
This is probably my biggest problem with AI tools, having played around with them more lately.
"You're absolutely right! I made a mistake. I have now comprehensively solved this problem. Here is the corrected output: [totally incorrect output]."
None of them ever seem to have the ability to say "I cannot seem to do this" or "I am uncertain if this is correct, confidence level 25%" The only time they will give up or refuse to do something is when they are deliberately programmed to censor for often dubious "AI safety" reasons. All other times, they come back again and again with extreme confidence as they totally produce garbage output.
- BrandoElFollito 3 months ago
  
  I agree, I see the same even in simple code where they will bend backwards apologizing and generate very similar crap.
  It is like they are sometimes stuck in a local energetic minimum and will just wobble around various similar (and incorrect) answers.
  What was annoying in my attempt above is that the picture was identical for every attempt
  
  2 replies →
- int_19h 3 months ago
  
  Gemini specifically is actually kinda notorious for giving up.
  https://www.reddit.com/r/artificial/comments/1mp5mks/this_is...
notatoad 3 months ago
you can click the share icon (the two-way branch icon, it doesn't look like apple's share icon) under the image it generates to share the conversation.
i'm curious if the clock image it was giving you was the same one it was giving me
https://gemini.google.com/share/780db71cfb73
- BrandoElFollito 3 months ago
  
  Thanks for the tip about sharing!
  No, my clock was an old style one, to be put on a shelf. But at least it had a "13" proudly right above the "12" :)
  This reminds me my kids when they were in kindergarden and were bringing home their art that needed extra explanation to realize what it was. But they were very proud!

edub 3 months ago

I was able to have AI generate an image that made this, but not by diffusion/autoregressive but by having it write Python code to create the image.

ChatGPT made a nice looking clock with matplotlib that had some bugs that it had to fix (hours were counter-clockwise). Gemini made correct code one-shot, it used Pillow instead of matplotlib, but it didn't look as nice.

giancarlostoro 3 months ago

Weird, I never tried that, I tried all the usual tricks that usually work including swearing at the model (this scarily works surprisingly well with LLMs) and nothing. I even tried to go the opposite direction, I want a 6 hour clock.

nl 3 months ago

I do playing card generation and almost all struggle beyond the "6 of X"

My working theory is that they were trained really hard to generate 5 fingers on hands but their counting drops off quickly.

IAmGraydon 3 months ago

That's because they literally cannot do that. Doing what you're asking requires an understanding of why the numbers on the clock face are where they are and what it would mean if there was an extra hour on the clock (ie that you would have to divide 360 by 13 to begin to understand where the numbers would go). AI models have no concept of anything that's not included in their training data. Yet people continue to anthropomorphize this technology and are surprised when it becomes obvious that it's not actually thinking.

energy123 3 months ago

The hope was for this understanding to emerge as the most efficient solution to the next-token prediction problem.
Put another way, it was hoped that once the dataset got rich enough, developing this understanding is actually more efficient for the neural network than memorizing the training data.
The useful question to ask, if you believe the hope is not bearing fruit, is why. Point specifically to the absent data or the flawed assumption being made.
Or more realistically, put in the creative and difficult research work required to discover the answer to that question.
bobbylarrybobby 3 months ago

It's interesting because if you asked them to write code to generate an SVG of a clock, they'd probably use a loop from 1 to 12, using sin and cos of the angle (given by the loop index over 12 times 2pi) to place the numerals. They know how to do this, and so they basically understand the process that generates a clock face. And extrapolating from that to 13 hours is trivial (for a human). So the fact that they can't do this extrapolation on their own is very odd.
echelon 3 months ago
gpt-image-1 and Google Imagen understand prompts, they just don't have training data to cover these use cases.
gpt-image-1 and Imagen are wickedly smart.
The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.
- phkahler 3 months ago
  
  >> The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.
  That's great, but I bet it can't tie it's own shoes.
  
  2 replies →
ryandrake 3 months ago
I wonder if you would have more success if you painstakingly described the shape and features of a clock in great detail but never used the words clock or time or anything that might give the AI the hint that they were supposed to output something like a clock.
- BrandoElFollito 3 months ago
  
  And this is a problem for me. I guess that it would work, but as soon as the word "clock" appears, gone is the request because a clock HAS.12.HOURS.
  I use this a lot in cybersecurity when I need to do something "illegal". I am refused help, until I say that I am doing research on cybersecurity. In that case no problem.
Workaccount2 3 months ago
The problem is more likely the tokenization of images than anything. These models do their absolute worst when pictures are involved, but are seemingly miraculous at generalizing with just text.
- chemotaxis 3 months ago
  
  I wonder if it's because we mean different things by generalization.
  For text, "generalization" is still "generate text that conforms to all the usual rules of the language". For images of 13-hour clock faces, we're explicitly asking the LLM to violate the inferred rules of the universe.
  I think a good analogy would be asking an LLM to write in English, except the word "the" now means "purple". They will struggle to adhere to this prompt in a conversation.
  
  2 replies →
godelski 3 months ago

Yes, the problem is that these so called "world models" do not actually contain a model of the world, or any world

chanux 3 months ago

Ah! This is so sad. The manager types won't be able to add an hour (actually, two) to the day even with AI.

snek_case 3 months ago

From my experience they quickly fail to understand anything beyond a superficial description of the image you want.

atorodius 3 months ago
That's less and less true
https://minimaxir.com/2025/11/nano-banana-prompts/
- dang 3 months ago
  
  Related ongoing thread:
  Nano Banana can be prompt engineered for nuanced AI image generation - https://news.ycombinator.com/item?id=45917875 - Nov 2025 (214 comments)

usui 3 months ago

I've been trying for the longest time and across models to generate pictures or cartoons of people with six fingers and now they won't do it. They always say they accomplished it, but the result always has 5 fingers. I hate being gaslit.

coffeecoders 3 months ago

LLMs are terrible for out-of-distribution (OOD) tasks. You should use chain of thought suppression and give constaints explictly.

My prompt to Grok:

---

Follow these rules exactly:

- There are 13 hours, labeled 1–13.

- There are 13 ticks.

- The center of each number is at angle: index * (360/13)

- Do not infer anything else.

- Do not apply knowledge of normal clocks.

Use the following variables:

HOUR_COUNT = 13

ANGLE_PER_HOUR = 360 / 13 // 27.692307°

Use index i ∈ [0..12] for hour marks:

angle_i = i * ANGLE_PER_HOUR

I want html/css (single file) of a 13-hour analog clock.

---

Output from grok.

https://jsfiddle.net/y9zukcnx/1/

chemotaxis 3 months ago
> Follow these rules exactly:
"Here's the line-by-line specification of the program I need you to write. Write that program."
- signatoremo 3 months ago
  
  Can you write this program in any language?
  
  2 replies →
- serf 3 months ago
  
  it's lazy to dust off the major advantages of a pseudocode-to-anylanguage transpiler as if it's somehow easy or commonplace.
BrandoElFollito 3 months ago
Well, that's cheating :) You asked it to generate code, which is ok because it does not represent a direct generated image of a clock.
Can grok generate images? What would the result be?
I will try your prompt on chatgpt and gemini
- BrandoElFollito 3 months ago
  
  Gemini failed miserably - a standard 12 hours clock
  Same for chatgpt
  And perplexity replaced 12 with 13
  
  1 reply →
chiwilliams 3 months ago
I'll also note that the output isn't quite right --- the top number should be 13 rather than 1!
- layer8 3 months ago
  
  I mean, the specification for the hour marks (angle_i) starts with a mark at angle 0. It just followed that spec. ;)
NooneAtAll3 3 months ago

close enough, but digit at the top should be the highest, not 1 :/