Comment by HarHarVeryFunny
2 days ago
I'm not a computer. I expect a computer to also do better than me at memorizing the phone book, but I'm not impressed by it.
2 days ago
I'm not a computer. I expect a computer to also do better than me at memorizing the phone book, but I'm not impressed by it.
In that case, are you at all surprised that this technology did not exist two years ago?
I'm not sure what you're getting at. What's useful about LLMs, and especially multi-modal ones, is that that you can ask them anything and they'll answer to best of their ability (especially if well prompted). I'm not sure that o3, as a "reasoning" model is adding much value here - since there is not a whole lot of reasoning going on.
This is basically fine-grained image captioning followed by nearest neighbor search, which is certainly something you could have built as soon as decent NN-based image captioning became available, at least 10 years ago. Did anyone do it? I've no idea, although it'd seem surprising if not.
As noted, what's useful about LLMs is that they are a "generic solution", so one doesn't need to create a custom ML-based app to be able to do things like this, but I don't find much of a surprise factor in them doing well at geoguessing since this type of "fuzzy lookup" is exactly what a predict-next-token engine is designed to do.
How does nearest neighbor search relate to this?
3 replies →
So you admit that this tech is at least 2 years old publicly and likely much older privately?
Did it not, or no one was interested enough to build one? I’m pretty certain there’s a database of portraits somewhere where they search id details from photograph. Automatic tagging exists for photo software. I don’t see why that can be extrapolated to landmarks with enough data.
I think you are underestimating the importance of a "world model" in the process. It is the modeling of how all these details are related to each other that is critical here.
The LLM will have an edge by being able to draw on higher level abstract concepts.
1 reply →
If it existed two years ago I certainly couldn't play with it on my phone.
1 reply →