Comment by AIPedant

10 days ago

I’ve also seen a few of those where it gets the answer right but uses reasoning based on confabulated details that weren’t actually in the photo (e.g. saying that a clue is that traffic drives on the left, but there is no traffic in the photo). It seems to me that it just generated a bunch of hopefully relevant tokens as a way to autocomplete the “This photo was taken in Bern” tokens.

I think the more innocuous explanation for both of these is what Anthropic discussed last week or so about LLMs not properly explaining themselves: reasoning models create text that looks like reasoning, which helps solve problems, but isn’t always a faithful description of how the model actually got to the answer.

A really good point that there’s no guarantee that the reasoning tokens align with model weights’ meanings.

In this case it seems unlikely to me that it would confabulate its exif read to back up an accurate “hunch”

  • Agreed - to be clear I was saying it confabulated analyzing the visual details of the photo to back up its actual reasoning of reading the EXIF. I am not sure that “low‑slung pre‑Alpine ridge line, and the latitudinal light angle that matches mid‑February at ~47 ° N” is actually evident in the photo (the second point seems especially questionable), but that’s not what it used to determine the answer. Instead it determined the answer and autocompleted an explanation of its reasoning that fit the answer.

    That’s why I mentioned the case where it made up things that weren’t in the photo - “drives on the left” is a valuable GeoGuesser clue, so if GPT looks at the EXIF and determines the photo is in London, then it is highly probable that a GeoGuesser player would mention this while playing the game given the answer is London, so GPT is probable to make that “observation” itself, even if it’s spurious for the specific photo.

    I just noticed that its explanation has a funny slip-up: I assume there is nothing in the actual photo that indicates the picture was taken in mid-February, but the model used the date from the EXIF in its explanation. Oops :)

> reasoning models create text that looks like reasoning, which helps solve problems, but isn’t always a faithful description of how the model actually got to the answer

Correct. Just more generated bullshit on top of the already generated bullshit.

I wish the bubble would pop already and they make an LLM that would return straight up references to the training set instead of the anthropomorphic conversation-like format.