← Back to context

Comment by rounakdatta

2 years ago

I just tried out a vision reasoning task: https://g.co/bard/share/e8ed970d1cd7 and it hallucinated. Hello Deepmind, are you taking notes?

Is this something we really expect AI to get right with high accuracy with an image like that?

For one, there's a huge dark line that isn't even clear to me what it is and what that means for street crossings.

I am definitely not confident I could answer that question correctly.

  • The answer Bard gave is not even very coherent. Very similar results with GPT-4V as well. This makes me very cusrious how exactly do these models "see". Are they intelligently following the route starting from one point all along, or are they just tracing it top-to-bottom-left-to-right? Seemingly, latter is the case.

    I expected that the AI would be able to understand that say taking a right turn from a straight road to another sub-road definitely involves crossing (since I specified that one is running on the left of the road). And try answering along those lines.

    • Maybe a heavily fine-tuned image AI would get this right.

      I don't see a world in which a general model like GPT or Gemini gets stuff like this correct with high accuracy any time soon.

It's not at all clear what model you're getting from Bard right now.

  • ... though that is itself a concern with Bard right?

    • Sure, to some extent. It's inside baseball for 99% of users but for the few who care or are curious there should be a "stats for nerds" button.

      Edited: now Bard is showing me a banner that says it is Gemini Pro.