Comment by onlyrealcuzzo
2 years ago
Is this something we really expect AI to get right with high accuracy with an image like that?
For one, there's a huge dark line that isn't even clear to me what it is and what that means for street crossings.
I am definitely not confident I could answer that question correctly.
The answer Bard gave is not even very coherent. Very similar results with GPT-4V as well. This makes me very cusrious how exactly do these models "see". Are they intelligently following the route starting from one point all along, or are they just tracing it top-to-bottom-left-to-right? Seemingly, latter is the case.
I expected that the AI would be able to understand that say taking a right turn from a straight road to another sub-road definitely involves crossing (since I specified that one is running on the left of the road). And try answering along those lines.
Maybe a heavily fine-tuned image AI would get this right.
I don't see a world in which a general model like GPT or Gemini gets stuff like this correct with high accuracy any time soon.