← Back to context

Comment by AIPedant

2 days ago

There have been a few cases where the LLM clearly did look at the EXIF, got the answer, then confabulated a bunch of GeoGusser logic to justify the answer. Sometimes that's presented as deception/misalignment but that's a category error: "find the answer" and "explain your reasoning" are two distinct tasks, and LLMs are not actually smart enough to coherently link them. They do one autocomplete for generating text that finds the answer and a separate autocomplete for generating text that looks like an explanation.

> Sometimes that's presented as deception/misalignment but that's a category error: "find the answer" and "explain your reasoning" are two distinct tasks

Right but if your answer to "explain your reasoning" is not a true representation of your reasoning, then you are being deceptive. If it doesn't "know" its reasoning, then the honest answer is that it doesn't know.

(To head off any meta-commentary on humans' inability to explain their own reasoning, they would at least be able to honestly describe whether they used EXIF or actual semantic knowledge of a photography)

  • My point is that dishonesty/misalignment doesn't make sense for o3, which is not capable of being honest because it's not capable of understanding what words mean. It's like saying a monkey at a typewriter is being dishonest if it happens to write a falsehood.

    • You seem to be saying that only sentient beings can lie, which is too semantic for my tastes.

      But AI models can certainly 1) provide incorrect information, and even 2) reason that providing incorrect information is the best course of action.

      1 reply →

I think an alternative possible explanation is it could be "double checking" the meta data. Like provide images with manipulated meta data as a test.

Do you have links to any of those examples?

  • I have one link that illustrates what I mean: https://chatgpt.com/share/6802e229-c6a0-800f-898a-44171a0c7d... The line about "the latitudinal light angle that matches mid‑February at ~47 ° N." seems like pure BS to me, and in the reasoning trace it openly reads the EXIF.

    A more clear example I don't have a link for, it was on Twitter somewhere: someone tested a photo from Suriname and o3 said one of the clues was left-handed traffic. But there was no traffic in the photo. "Left-handed traffic" is a very valuable GeoGuesser clue, and it seemed to me that once o3 read the Surinamese EXIF, it confabulated the traffic detail.

    It's pure stochastic parroting: given you are playing GeoGuesser honestly, and given the answer is Suriname, the conditional probability that you mention left-handed traffic is very high. So o3 autocompleted that for itself while "explaining" its "reasoning."