← Back to context

Comment by simonw

2 days ago

If you want to be impressed I suggest trying this yourself on your own photos.

I don't consider it my job to impress or mind-blow people: I try to present as realistic as possible a representation of what this stuff can do.

That's why I picked an example where its first guess was 200 miles off!

Reading the replies to this is funny. It's like the classic dropbox thread. "But this could be done with a nearest neighbor search and feature detection!" If this isn't mind blowing to someone I don't know if any amount of explaining will help them get it.

  • It's not mindblowing because there were public systems doing performing much better years earlier. Using the exact same tech. This is less like rsync vs drop box and more like you are freaking out over Origin or Uplay when Steam has been around for years.

I'm not a computer. I expect a computer to also do better than me at memorizing the phone book, but I'm not impressed by it.

  • In that case, are you at all surprised that this technology did not exist two years ago?

    • I'm not sure what you're getting at. What's useful about LLMs, and especially multi-modal ones, is that that you can ask them anything and they'll answer to best of their ability (especially if well prompted). I'm not sure that o3, as a "reasoning" model is adding much value here - since there is not a whole lot of reasoning going on.

      This is basically fine-grained image captioning followed by nearest neighbor search, which is certainly something you could have built as soon as decent NN-based image captioning became available, at least 10 years ago. Did anyone do it? I've no idea, although it'd seem surprising if not.

      As noted, what's useful about LLMs is that they are a "generic solution", so one doesn't need to create a custom ML-based app to be able to do things like this, but I don't find much of a surprise factor in them doing well at geoguessing since this type of "fuzzy lookup" is exactly what a predict-next-token engine is designed to do.

      4 replies →

    • So you admit that this tech is at least 2 years old publicly and likely much older privately?

    • Did it not, or no one was interested enough to build one? I’m pretty certain there’s a database of portraits somewhere where they search id details from photograph. Automatic tagging exists for photo software. I don’t see why that can be extrapolated to landmarks with enough data.

      4 replies →