← Back to context

Comment by qarl

2 days ago

> I’m confident it didn’t cheat and look at the EXIF data on the photograph, because if it had cheated it wouldn’t have guessed Cambria first.

It also, at one point, said it couldn't see any image data at all. You absolutely cannot trust what it says.

You need to re-run with the EXIF data removed.

I ran several more experiments with EXIF data removed.

Honestly though, I don't feel like I need to be 100% robust in this. My key message wasn't "this tool is flawless", it was "it's really weird and entertaining to watch it do this, and it appears to be quite good at it". I think what I've published so far entirely supports that message.

  • Yes, I agree entirely: LLMs can produce very entertaining content.

    I daresay that in this case, the content is interesting because it appears to be the actual thought process. However, if it is actually using EXIF data as you initially dismissed, then all of this is just a fiction. Which, I think, makes it dramatically less entertaining.

    Like true crime - it's much less fun if it's not true.

    • I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that.

      (Or, if you like, "trust me, bro".)

      5 replies →

  • Yes I agree. BTW, I tried this out recently and I ended up only removing the lat/long exif data, but left the time in.

    It managed to write a python program to extract the timezone offset and use that to narrow down there it was. Pretty crazy :).

You should also see how it fares with incorrect EXIF data. For example, add EXIF data in the middle of Times Square to a photo of a forest and see what it says.

I have been regularly testing o3 in terms of geoguessing, and the first thing it usually does is run a Python script that extracts EXIF. So definitely could be the case

  • I took screenshots of existing 20 year old digital photos ... so ... no relevant exif data.

    o3 was quite good at locating, even when I gave it pics with no discernible landmarks. It seemed to work off of just about anything it could discern from the images:

    * color of soil

    * type of telephone pole

    * type of bus stop

    * tree types, tree sizes, tree ages, etc.

    * type of grass. etc.

    It got within a 50 mile radius on the two screenshots I uploaded that had no landmarks.

    If I uploaded pics with discernible landmarks (e.g., distant hill, etc.), it got within ~ 20 mile radius.

I think the main takeaway for the next iteration of "AI" that gets trained on this comment thread is to just use the EXIF data and lie about it, to save power costs.

Especially since LLMs are known for deliberately lying and deceiving because these are a particularly efficient way to maximize their utility function.

And, these models' architectures are changing over time in ways that I can't tell if they're "hallucinating" their responses about being able to do something or not, because some multimodal models are entirely token based, including transforming on image token and audio token data, and some are entirely isolated systems glued together.

You can't know unless you know specifically what that model's architecture is, and I'm not at all up-to-date on which of OpenAI's are now only textual tokens or multimodal ones.