← Back to context

Comment by rounakdatta

2 years ago

I just tried out a vision reasoning task: https://g.co/bard/share/e8ed970d1cd7 and it hallucinated. Hello Deepmind, are you taking notes?

6 comments

rounakdatta

Reply

onlyrealcuzzo 2 years ago

Is this something we really expect AI to get right with high accuracy with an image like that?

For one, there's a huge dark line that isn't even clear to me what it is and what that means for street crossings.

I am definitely not confident I could answer that question correctly.

rounakdatta 2 years ago
The answer Bard gave is not even very coherent. Very similar results with GPT-4V as well. This makes me very cusrious how exactly do these models "see". Are they intelligently following the route starting from one point all along, or are they just tracing it top-to-bottom-left-to-right? Seemingly, latter is the case.
I expected that the AI would be able to understand that say taking a right turn from a straight road to another sub-road definitely involves crossing (since I specified that one is running on the left of the road). And try answering along those lines.
- onlyrealcuzzo 2 years ago
  
  Maybe a heavily fine-tuned image AI would get this right.
  I don't see a world in which a general model like GPT or Gemini gets stuff like this correct with high accuracy any time soon.

jeffbee 2 years ago

It's not at all clear what model you're getting from Bard right now.

abeppu 2 years ago
... though that is itself a concern with Bard right?
- jeffbee 2 years ago
  
  Sure, to some extent. It's inside baseball for 99% of users but for the few who care or are curious there should be a "stats for nerds" button.
  Edited: now Bard is showing me a banner that says it is Gemini Pro.